Watermarking of Parkinson Disease Speech in Cloud-Based Healthcare Framework

Abstract

Mobile healthcare in a cloud-based system increases the easiness and the ubiquitous nature of patient-doctor relationship. One of the major issues of this healthcare is secure transmission and data authenticity. If the data is not transmitted securely or not authenticated, the clients may face embarrassment. In this paper, we propose a cloud-based healthcare framework that will authenticate speech data from a patient suspected to have Parkinson's disease. The patient sends his or her speech signal recorded via a smart phone through Internet to the cloud. A discrete wavelet transform- (DWT-) singular value decomposition (SVD) based speech watermarking module is run in the cloud to embed watermark to the signal. In case of authentication, watermark is extracted from the questioned signal and matched with the stored watermark. Experimental results indicate that the proposed DWT-SVD based watermarking system achieves imperceptibility and is robust against attacks such as additive white Gaussian noise and filtering.

1. Introduction

The application of healthcare system in a cloud-based framework is increasing day by day due to its heterogeneous nature in terms of processing capability and capacity [1, 2]. The introduction of cloud in healthcare has opened the possibility of elderly homecare, reduced the physical transportation of patients, increased the availability of multiple consultancy from the doctors, and so forth. However, the increasing use of wireless transmission of health related data raises the concern of data protection and authenticity. Medical data are considered to be private and should not be made public without proper permission. Without ensuring proper security, the patients' privacy may be vulnerable [3–6]. For example, if a patient has some embarrassing disease, any leakage of that data will make the patient embarrassed inside the society. This may lead to losing his or her job, making him or her isolated or depressed, problems to insurance protection, and so forth [7]. Medical data can be shared between the physicians, insurance companies, family members, caregivers, and sports coaches. If there is no appropriate protection, the data can be collected unauthorized by political party agents, personal enemies, and rival coaches. On the other hand, a hacker can hack any medical data and post on social networks to defame any individual. If there is no measurement of authentication of the data, that particular individual will suffer mental fatigue and embarrassment. If the healthcare service provider does not follow HIPAA rules [8], he or she is subject to face strict civil and criminal penalties. Therefore, data protection and data authentication are crucial in an e-healthcare system.

In this paper, we address the issue of data authentication by embedding watermark to speech data of Parkinson's disease patient. Parkinson's disease (PD) is a degenerative disorder in the central nervous system marked by tremor, rigidity, anxiety, dementia, and slow and imprecise movement of muscles. The name of this disease came after Doctor James Parkinson, who described PD as “shaking palsy” in 1817 [9]. PD is generally observed with elderly people, and the symptoms worsen over time. It affects speech production in the vocal folds and transmission through the vocal tract, motor activities such as writing and balance [10], and also nonmotor activities such as depression, autonomic dysfunction, and visual hallucinations [11]. PD is the second most common neurodegenerative disorder after Alzheimer. Approximately 10 million people around the world suffer from PD and Saudi Arabia is ranked 24th in terms of death rate by PD (2.6 per 100,000) [12]. PD is very difficult to cure; however, early detection and treatment can help the patient gain some control over some motor and nonmotor symptoms. The detection and treatment can be either invasive, such as surgery, or noninvasive, such as medicines; however, both surgery and medicines can be risky to the patient.

Most of the PD patients suffer from vocal fold dysphonia. Therefore, analyzing speech is a popular choice of PD diagnosis. This popularity comes from relatively low cost involved, its noninvasive nature, and being easy-to-use in telemedicine [13]. The speech of a PD patient is different than that of a healthy person, making the speech disorder detection for PD diagnosis a realistic approach. Many features of speech were investigated in the literature in this aspect, such as Mel-frequency cepstral coefficients (MFCC), shimmer, jitter, harmonic-to-noise ratio, pitch period entropy, degree of voice breaks, and autocorrelation [14].

To diagnose or treat PD in e-healthcare system, the transmission of speech signals should be protected and authenticated. To protect data, watermarking is one of the widely used techniques. Several watermarking algorithms exist in literature for telemedicine applications. For example, Singh et al. embedded text and image watermarks into cover radiological images for secure medical data transmission [15]. Optical 3D watermark was used in [16]; identification of liabilities based on watermarking was proposed in [17]. However, watermarking in audio, voice, or speech signal for telemedicine applications is not common.

This paper presents a watermarking procedure in e-healthcare system for the purpose of PD diagnosis in a cloud-based framework. Discrete wavelet transform (DWT) and singular value decomposition (SVD) are used in the watermarking procedure. DWT-SVD based audio watermarking was proposed before; however, it was not used in e-healthcare system. For example, Ali and Ahn presented DWT-SVD based watermarking procedure using self-adaptive differential evolution technique [18]. Two-level DWT was used and all the subbands were utilized to embed the watermark. Lei et al. proposed a blind watermark scheme based on SVD and discrete cosine transform (DCT) [19]. A selection of large singular value coefficients were utilized in SVD-DWT based audio watermarking algorithm in [20]. None of these methods considered the use of watermark in e-healthcare or in cloud-based systems. To the best of our knowledge, the present study is the first study to address audio watermarking in a cloud-based healthcare system. The feasibility of using DWT-SVD based watermarking is investigated by conducting a number of experiments in this study.

The rest of the paper is organized as follows. Section 2 outlines the framework of the proposed cloud-based healthcare framework; Section 3 describes the watermarking system; Section 4 presents the experiments; and, finally, Section 5 draws some conclusions.

2. Proposed Cloud-Based Healthcare Framework

A cloud framework has many interesting characteristics such as extensive network access, huge storage, on demand self-service, resource allocation, and measured service [21, 22]. The cloud infrastructure can be defined as a software-platform-infrastructure (SPI) model that consists of Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS). Clients interacted with SaaS and IaaS is connected to data center as shown in Figure 1. SaaS includes e-mail, office programs, and social networking through which a client can deliver his or her message or data. PaaS includes appropriate operating system (OS), database (DB), and webserver, while IaaS includes virtual machines (VMs) and virtual local area networks (LAN).

Figure 1

SPI model of a cloud infrastructure.

The proposed cloud-based healthcare framework is illustrated in Figure 2. There are two end users, who are patients or clients and medical doctors and caregivers. The patient or the client who wishes to examine whether he or she has PD or not will simply record his or her speech using a mobile app. The mobile phones are equipped with internal microphones that can sense the speech and record it. The sensing is thereby noninvasive and wireless. The other end user is medical doctor or caregiver, who is specialized to analyze PD from speech and has proper access to the system. This end user can be located anywhere in the world.

Figure 2

Cloud-based healthcare framework for watermarking of data.

The main component of the framework is the cloud manager (CM). The CM controls and manages the framework in a seamless way. The responsibilities of the CM includes the following: (i)

Registration of the end users: the end users have to register at the beginning to be within this framework. The CM registers the end users.

(ii)

Authentication of the users: if a user wants to access the framework, the CM verifies his or her registered record and appropriately grants the access.

(iii)

Profile management: the CM manages the profile of the end users and periodically updates it if necessary.

(iv)

Context management: the CM extracts the context of the data and decides for watermarking or classifying the speech.

(v)

Communication management: the CM initiates, controls, and terminates the collaboration between the users.

The resource allocation manager assigns different VM resources for various sessions and web services. Depending on the load capacity, it also configures VM capacities. The VMs work as an interface between the storage and the web servers.

The proposed framework contains some dedicated servers for the specific task of watermarking and classifying the speech signal. One server is responsible for watermark embedding and extraction and storing the appropriate keys. Another server is responsible for feature extraction from the speech signal and for classification. Watermark embedding and extraction will be described in detail in the following section.

The task flow of the whole framework is given below.

Step 1.

A client, whether he or she is a healthy person or a person suspected to have PD, requests the service by using his smart phone upon accessing Internet.

Step 2.

The CM registers the client and assigns him an ID.

Step 3.

The client records his or her speech through the smart phone and uploads it.

Step 4.

The recorded speech sample is sent to the CM, where it authenticates the client.

Step 5.

The CM sends the sample to the resource allocation manager, which in turn sends to collaborative service manager. The service manager allocates the task to watermark server to watermark the sample, stores it, and sends it to feature extraction and classification server for analysis.

Step 6.

If the sample is detected as a sample of PD, the CM sends the watermarked sample to the doctors and caregivers. If it is not detected as PD, the CM notifies the client.

Step 7.

The doctor downloads the sample and further analyzes it for diagnosis.

Step 8.

The doctor uploads his diagnosis and advice to the web.

Step 9.

The CM gets the diagnosis and alerts the client.

Step 10.

The client downloads the diagnosis and advice from the doctor.

3. Proposed Watermarking System

The proposed PD speech signal watermarking system has two main components, which are watermark embedding and watermark extraction. A DWT-SVD based watermarking scheme is adopted in the proposed system. The DWT-SVD watermarking scheme is computationally efficient and robust against major attacks in an audio signal. In the following subsections, the components are described in detail.

3.1. Discrete Wavelet Transform (DWT)

DWT is a multiresolution technique that decomposes a signal into different resolutions of time and frequency. For a one-level DWT, a given signal is passed through a low-pass filter and a high-pass filter. The output of the low-pass filter is called approximation (L), while that of the high-pass is called detail (H). Figure 3 shows a one-level decomposition of a signal $c^{j - 1}$ . The low-pass and high-pass filters are denoted by h and g, respectively. A downsampling by a factor of 2 is applied to the filter outputs. The subbands L and H jointly become the size of the input signal. Figure 4 shows a two-level decomposition of a signal. In the figure, $L 2$ corresponds to the approximation coefficient (at level 2), and $H 2$ and $H 1$ correspond to detail's coefficients at level 2 and level 1, respectively.

Figure 3

One-level decomposition of DWT.

Figure 4

Two-level decomposition of DWT.

There are several watermarking algorithms based on the subbands of DWT. Peng et al., Xiang, and Wu et al. used the approximation subband to embed watermark bits [23–25]. Many other algorithms used a detail's subband for embedding watermark bits [26–29]. They mainly differed on which detail's subband to use. A claim of good imperceptibility and high robustness was reported by using these algorithms.

3.2. Singular Matrix Decomposition (SVD)

SVD is a matrix factorization technique that decomposes a matrix into three matrices. If a rectangular matrix A of size $I \times J$ is the input, the output will be two orthogonal matrices and one diagonal matrix as follows:

\begin{matrix} A_{I \times J} = U_{I \times I} S_{I \times J} V_{J \times J}^{T}, \end{matrix}

(1)

where

U^{T} U = I_{I \times I}

and

V^{T} V = I_{J \times J}

, which means that U and V are orthogonal. S is a diagonal matrix, whose diagonal entries are singular values and arranged in descending order. These singular values are always real numbers. The computation of SVD is stable against round-off errors. With the fact that a slight variation in the values of S matrix does not affect the perception of a speech signal, watermark bits can be added to the singular values of S to get a robust watermarking.

The application of SVD in watermarking algorithm is relatively new. El-Samie inserted watermark bits in all the singular values of matrix S [30]. Bhat et al. quantized the norms of all the singular values and used them in watermarking algorithm [31]; however, in [32], the authors utilized only the largest singular value. Ray et al. used encrypted values, obtained by root mean square method, of the singular values, instead of the original singular values, in watermark procedure [33]. All these algorithms varied in the way the diagonal matrix S was used.

3.3. DWT-SVD Based Watermark Algorithm

Figure 5 shows the proposed DWT-SVD based speech signal watermark algorithm. The watermarking is done in the cloud. The watermark is embedded in the $H 2$ subband of the speech signal. The details are described in the following subsections.

Figure 5

Proposed DWT-SVD based speech signal watermark algorithm.

3.3.1. Creating Watermark Image

The watermark image consists of the patient's ID in image format. For example, if the patient has the ID of A23415610, then the watermark image will look like Figure 6. Let one name the watermark image as watermark. The size of watermark is $I \times J$ , where $I > J .$

Figure 6

Watermark image containing the patient's ID.

3.3.2. Transforming the Watermark Image Using SVD

SVD transformation is applied to the watermark image using the following steps.

Step 1.

Normalize the image matrix by 255:

\begin{matrix} I m_{i, j} = \{\frac{{watermark}_{i, j}}{255}; 0 \leq i \leq I, 0 \leq j \leq J\} . \end{matrix}

(2)

Step 2.

Apply SVD on the normalized matrix. The resultant $S_{w}$ is a square matrix of size $I \times I$ :

\begin{matrix} I m = U_{w} \cdot S_{w} \cdot V_{w}^{T} . \end{matrix}

(3)

Step 3.

Multiply $S_{w}$ by a watermark intensity factor, α. Consider

\begin{matrix} S_{w α} = α \cdot S_{w} . \end{matrix}

(4)

Step 4.

Store $U_{w}$ , $V_{w}^{T}$ , and α for watermark extraction. Use $S_{w α}$ for watermark embedding.

3.3.3. Transforming the Speech Signal Using DWT-SVD

DWT-SVD transformation is applied to the speech signal using the following steps.

Step 1.

Divide the speech signal into nonoverlapping frames, where the frame length is 30 milliseconds. Suppose we have N frames.

Step 2.

Apply two-level DWT on each frame. Take $H 2$ (detail's coefficients at level 2) for watermark embedding. Store $L 2$ (approximation coefficient) and $H 1$ (detail's coefficients at level 1) for reconstruction of watermarked speech signal.

Step 3.

Form a matrix G using $H 2$ of all the frames. The number of rows corresponds to the number of frames of the signal.

Step 4.

Apply SVD on matrix G. The resultant $S_{s}$ is a square matrix of size $N \times N$ . Consider

\begin{matrix} G = U_{s} \cdot S_{s} \cdot {V_{s}}^{T} . \end{matrix}

(5)

3.3.4. Watermark Embedding

The watermark is embedded in the speech signal using the following steps.

Step 1.

A new matrix, $S_{n e w}$ , of size $N \times N$ is formed by using matrices $S_{w α}$ and $S_{s}$ obtained from Sections 3.3.2 and 3.3.3. Consider

\begin{matrix} S_{n e w} = \{\begin{cases} S_{w α} (n, n) + S_{s} (n, n), & 1 \leq n \leq I, \\ S_{s} (n, n), & (I + 1) \leq n \leq N . \end{cases} \end{matrix}

(6)

Step 2.

Using $U_{s}$ , $V_{s}^{T}$ , and $S_{n e w}$ , perform inverse SVD to get matrix $G^{'}$ . Consider

\begin{matrix} G^{'} = U_{s} \cdot S_{n e w} \cdot V_{s}^{T} . \end{matrix}

(7)

Step 3.

Using $L 2$ , $G^{'}$ , and $H 1$ , perform inverse DWT to get watermarked speech signal.

3.3.5. Watermark Extraction

Watermark extraction is just the reverse procedure of watermark embedding. Figure 7 shows the extraction procedure. From the figure, we notice that, to extract the watermark, we need the original speech signal $U_{w}$ and $V_{w}^{T}$ matrices and the watermark intensity factor, α.

Figure 7

Watermark extraction procedure.

The following steps are applied to extract the watermark.

Step 1.

Subtract $S_{s}$ from $S_{n e w}$ to get $S_{I m}$ . $S_{I m}$ should be equal to $S_{w}$ if the watermarked speech signal is not under attack. Consider

\begin{matrix} S_{I m} = S_{n e w} (n, n) - S_{s} (n, n), 1 \leq n \leq I . \end{matrix}

(8)

Step 2.

Apply inverse SVD to get the normalized watermark:

\begin{matrix} {I m}^{'} = U_{w} \cdot S_{I m} \cdot V_{w}^{T} . \end{matrix}

(9)

Step 3.

Get the watermark image by multiplying the values by 255 and dividing by α . Consider

\begin{matrix} {I m}_{i, j}^{'} = \{{I m}_{i, j}^{'} \times \frac{255}{α}; 0 \leq i \leq I, 0 \leq j \leq J\} . \end{matrix}

(10)

If there is no attack,

{I m}_{i, j}^{'}

will be the same as the watermark image.

3.4. Feature Extraction and Classification

To detect whether the speech signal is coming from a patient having PD or from a normal person, the proposed framework also has an option to detect PD from the speech signal. To achieve this, features extraction and classification are performed in the cloud (see Figure 2).

In general, feature extraction should extract meaningful information (features) from a given signal. These features should be representative of the signal, discriminative for different classes, and nonredundant. In line with this, we extract five features, which are jitter, shimmer, harmonic-to-noise ratio (HNR), fraction of locally unvoiced frames, and mean pitch. These five features represent five different attributes of a signal. Jitter is a frequency feature that is defined as pitch perturbation and mathematically expressed as

\begin{matrix} j i t t = \frac{\sum_{i = 0}^{N - 1} |T^{i} - T^{i + 1}|}{N - 1}, \end{matrix}

(11)

where T is a time period and N is the number of pitch periods.

Shimmer is an amplitude perturbation measure and mathematically expressed as

\begin{matrix} S h i m (d B) = \frac{1}{N - 1} \sum_{i = 1}^{N - 1} |20 \log_{10} (\frac{A^{i + 1}}{A^{i}})|, \end{matrix}

(12)

where A is the peak-to-peak amplitude.

HNR is a harmonicity parameter, which is represented by

\begin{matrix} H N R (d B) = 20 \log_{10} (\frac{E_{p}}{E_{n}}), \end{matrix}

(13)

where

E_{p}

and

E_{n}

are energy of periodic component and energy of noise component of the signal.

The feature “fraction of locally unvoiced frames” is a voicing parameter, and “mean pitch” is a pitch parameter.

These features are chosen because they are discriminative for normal voice and PD voice. A PD voice exhibits high jitter, high shimmer, low HNR, less locally unvoiced frames, and low pitch. In [14], some of these features were used; however, the levels of redundant features were high. For example, jitter (local), jitter (absolute), jitter (rap), jitter (ppq5), and jitter (ddp) were used to represent frequency attribute. All these jitter parameters represent the same using different mathematical expressions. In our proposed framework, we use one feature per attribute thereby limiting the confusion in the classification process.

For classification, we utilize support vector machine (SVM) for its simplicity and generalization capability. Linear kernel is used to project low dimensional space into a high dimensional space [14]. As the number of samples is low, we adopt leave-one-subject-out (LOSO) approach, where all but one subject samples are used in training and that subject's samples are used in testing. Therefore, there is no overlapping between training and testing samples in one round of experiment. This is repeated until all the subjects' samples are tested. The final accuracy is obtained by averaging the accuracies over rounds.

4. Experiments

In this section, the details of the experiments are presented. The description includes the database used in the experiments, evaluation metrics, experimental results, and comparison.

4.1. Dataset

4.1.1. For Watermarking

The speech signals from PD patients were obtained from [14]. As the raw wave signal is available only for their provided test data, we used these signals for our experiments. 28 PD patients were asked to sustain vowels “a” and “o” three times each. Therefore, a total of 168 recordings were available. The age range of the individuals was between 43 and 77, with mean 64.86 and standard deviation 8.97. The signals were recorded at the Department of Neurology in Cerrahpasa, Faculty of Medicine in Istanbul University. The sampling frequency were downsampled from 96 kHz to 32 kHz to reduce the transmission load.

Another dataset, called Saarbrucken Voice Disorder (SVD) database [34], was used for normal speech. The speech samples were recorded by the Institute of Phonetics of Saarland University, Germany. We selected speech samples from 100 normal subjects containing sustained vowels “a” and “o” with normal pitch.

4.1.2. For PD Detection

The speech signals from PD patients were obtained from [14]. There are two sets: training and testing. The training set consists of voice samples of 20 PD patients, of whom six are females and 14 are males, and 20 healthy persons of which 10 are females and 10 are males. We used only samples of sustained vowels “a” and “o.” The testing set consists of sustained vowels “a” and “o” samples spoken by 28 PD patients.

4.2. Evaluation Metrics

The performance of the proposed watermarking framework was measured in terms of imperceptibility and robustness against attacks [35]. Imperceptibility is a measure of how much the signal is distorted perceivably. To measure imperceptibility, we used signal-to-noise ratio (SNR) and listening test; the first one is objective and the second one is subjective. SNR is defined by

\begin{matrix} S N R_{d B} = 10 \log_{10} \frac{P_{s}}{P_{s} - P_{s}^{'}}, \end{matrix}

(14)

where

P_{s}

and

P_{s}^{'}

are the power of original speech signal and the watermarked speech signal, respectively. Another closely related metric is peak SNR or PSNR. In PSNR, the numerator of logarithm in (14) is replaced by the square of the maximum value of the pixel in the original watermark image.

In the listening test, the human listeners rate the played speech signal with one of the following grades: imperceptible, perceptible but not annoying, slightly annoying, annoying, and very much annoying, where imperceptible has grade 5 and very much annoying has grade 1. There were 15 listeners, who listened to both the original speech signals and the watermarked speech signals during training. During actual testing, they were given watermarked speech randomly.

With regard to robustness against attack, we considered two common attacks, which are additive white Gaussian noise (AWGN) and filtering of type low-pass, high-pass, and band-pass. The measurements were obtained by using a correlation factor, η, which is computed by using

\begin{matrix} η (w, w^{'}) = \frac{\sum_{i = 1}^{N} w_{i} w_{i}^{'}}{\sqrt{\sum_{i = 1}^{N} w_{i}^{2}} \sqrt{\sum_{i = 1}^{N} w_{i}^{' 2}}}, \end{matrix}

(15)

where w and

w^{'}

are the original and extracted watermark, respectively, N is the number of pixels in the watermark image, and η takes the value between 0 (no relation) and 1 (perfect relationship).

The performance of PD detection was measured in terms of accuracy (%).

4.3. Experimental Results

Table 1 shows SNR and PSNR, in decibels, for the speech signal “a” and “o.” In both the cases, the SNR and PSNR were well above 20 dB, which is a minimum requirement of the International Federation of Photographic Industry (IFPI) [36]. The SNR and PSNR were higher in normal speech than in PD speech. In normal speech, as the vocal folds can accurately open and close together, there is less or no noisy element, resulting in higher SNR or PSNR. In PD case, vocal folds cannot completely close or close in irregular manner; therefore it is already noisy (smaller amplitude), which causes relatively smaller SNR or PSNR. Table 2 shows the corresponding numbers when the watermarking was accomplished in the cloud. If we compare the values between Tables 1 and 2, we find that the wireless transmission of the speech signals did not distort the signal much, and hence the SNR (or the PSNR) values did not degrade much while the watermarking was embedded in the cloud server.

Table 1

SNR (dB) and PSNR (dB) using the proposed watermark scheme (in local server).

Utterance type →	“a”		“o”
Utterance type →	SNR (dB)	PSNR (dB)	SNR (dB)	PSNR (dB)
PD speech	52.12	67.45	51.32	66.54
Normal speech	56.43	68.42	53.67	68.02

Table 2

SNR (dB) and PSNR (dB) using the proposed watermark scheme (in cloud server).

Utterance type →	“a”		“o”
Utterance type →	SNR (dB)	PSNR (dB)	SNR (dB)	PSNR (dB)
PD speech	44.14	64.32	48.74	63.11
Normal speech	52.34	65.92	50.73	64.54

Table 3 shows average rating from the listeners to judge speech quality of the watermarked signals, when watermarking was done in the local server. As mentioned before, there were 15 human listeners. The values in Table 3 indicate that the proposed watermarking algorithm achieves imperceptibility. The listeners' ratings when the watermarking was done in the cloud server are given in Table 4. This rating is important because the doctors or the caregivers will actually listen to the speech signals that are watermarked in the cloud. From the table, we find that listeners' ratings did not change much even when the watermarking was done in the cloud server (transmission did not affect the algorithm).

Table 3

Average rating from the listeners (watermarking in the local server).

Utterance type →	“a”	“o”
PD speech	4.72	4.68
Normal speech	4.85	4.78

Table 4

Average rating from the listeners (watermarking in the cloud server).

Utterance type →	“a”	“o”
PD speech	4.61	4.60
Normal speech	4.74	4.71

Figure 8 shows correlation factor, η, after different types of attacks. The attacks were applied once the watermarking was embedded in the cloud server. The attacks included band-pass filtering with passband between 80 Hz and 4 kHz, high-pass filtering with cutoff frequencies of 80 Hz and 50 Hz, low-pass filtering with cutoff frequencies of 16 kHz, 8 kHz, and 4 kHz, and AGWN of 20 dB, 15 dB, and 10 dB. From the figure, we see that almost in all the cases the correlation factor was 1, which indicates the robustness of the proposed algorithm.

Figure 8

Correlation factor, η, after different types of attacks.

Figure 9 shows the accuracy of the proposed framework for PD detection. The proposed PD detection achieved 89.3% accuracy using samples “a” and 80.5% accuracy using samples “o.”

Figure 9

Accuracy (%) of PD detection for vowels “a” and “o.”

4.4. Comparison with Other Systems

We compared the proposed cloud-based watermarking system with other systems, namely, systems in [26, 37]. In [37], the system used DWT-DCT based approach. We took the results of these two systems as they were reported in the corresponding papers. Table 5 shows the comparison of performances between the two systems and the proposed system. All the systems' performances were based on the local server. It can be noted that imperceptibility results are hardly compared between the systems in the literature, because the materials are diverse, and the listeners are of course different. From the table, we find that the proposed system performed better than the two other systems.

Table 5

Comparison of performances between the systems.

System	Average SNR (dB)	Attack: AWGN, 20 dBValue: η	Attack: Low-pass filtering, cut off = 8 kHzValue: η
[37]	43.11	0.98	—
[26]	44.77	1.00	0.99
Proposed	54.12	1.00	1.00

The proposed PD detection was compared with another similar system in [14]. The features in their system were acquired from their repository and the experiments were carried out using the same classification approach as our proposed system. The comparison of accuracy in both is shown in Figure 9. From the figure, we see that the proposed system outperforms the system in [14] both using samples “a” and using samples “o” significantly.

5. Conclusion

DWT-SVD based speech watermarking scheme to entrust data in a cloud-based healthcare system was proposed. As a case study, the speech from patients having PD and from healthy subjects was investigated. The watermarking was embedded in the cloud and could be extracted for authentication in the cloud. The experimental results showed that the proposed scheme achieved imperceptibility and robustness against certain attacks including AWGN and filtering. The listeners' ratings were also high.

The whole framework can be used in a mobile healthcare system in entrusted way to diagnose PD. The client sends his speech to the cloud, where it is watermarked for subsequent transmission. The watermark speech is then classified into normal or having PD, with a degree of severity, in the cloud. After that the client and the doctors are notified about the classification. In a future study, we would like to investigate the classification task using watermark speech signal.

Footnotes

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

The authors extend their appreciation to the Deanship of Scientific Research at King Saud University, Riyadh, Saudi Arabia, for funding this work through the research group Project no. RG-1436-016.

References

Hossain

M. S.

Muhammad

Cloud-based collaborative media service framework for healthcare

International Journal of Distributed Sensor Networks 2014 2014 11

858712

10.1155/2014/858712

Hossain

M. S.

Muhammad

Cloud-assisted speech and face recognition framework for health monitoring

Mobile Networks and Applications 2015 20 3 391 399

10.1007/s11036-015-0586-3

Dimitriou

Ioannis

Security issues in biomedical wireless sensor networks

Proceedings of the 1st International Symposium on Applied Sciences in Biomedical and Communication Technologies (ISABEL ′08)

October 2008

Aalborg, Denmark

IEEE

1 5

10.1109/isabel.2008.4712577

2-s2.0-67650148473

Venkatasubramanian

K. K.

Gupta

S. K. S.

Security for pervasive health monitoring sensor applications

Proceedings of the 4th International Conference on Intelligent Sensing and Information Processing (ICISIP ′06)

December 2006

Bangalore, India

197 202

10.1109/icisip.2006.4286096

2-s2.0-43949092250

Xiao

Shen

Sun

Cai

Security and privacy in RFID and applications in telemedicine

IEEE Communications Magazine 2006 44 4 64 72

10.1109/MCOM.2006.1632651

2-s2.0-33646943487

Kumar

Lee

H.-J.

Security issues in healthcare applications using wireless medical sensor networks: a survey

Sensors 2012 12 1 55 91

10.3390/s120100055

2-s2.0-84863011893

Meingast

Roosta

Sastry

Security and privacy issues with health care information technology

Proceedings of the 28th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBS ′06)

September 2006

New York, NY, USA

5453 5458

10.1109/iembs.2006.260060

2-s2.0-34047155030

Office for Civil Rights and United State Department of Health and Human Services

Medical Privacy. National Standards of Protect the Privacy of Personal-Health-Information

June 2015, http://www.hhs.gov/ocr/privacy/hipaa/administrative/privacyrule/index.html

Langston

J. W.

Parkinson's disease: current and future challenges

NeuroToxicology 2002 23 4-5 443 450

10.1016/s0161-813x(02)00098-0

2-s2.0-0036776975

10.

O'Sullivan

S. B.

Schmitz

T. J.

Parkinson disease

Physical Rehabilitation 2007 5th

Philadelphia, Pa, USA

F. A. Davis Company

856 894

11.

Muangpaisan

Hori

Brayne

Systematic review of the prevalence and incidence of Parkinson's disease in Asia

Journal of Epidemiology 2009 19 6 281 293

10.2188/jea.je20081034

2-s2.0-74249084768

12.

Parkinson Disease http://www.worldlifeexpectancy.com/cause-of-death/parkinson-disease/by-country/

13.

Little

M. A.

McSharry

P. E.

Hunter

E. J.

Spielman

Ramig

L. O.

Suitability of dysphonia measurements for telemonitoring of Parkinson's disease

IEEE Transactions on Biomedical Engineering 2009 56 4 1015 1022

10.1109/TBME.2008.2005954

2-s2.0-84860547691

14.

Sakar

B. E.

Isenkul

M. E.

Sakar

C. O.

Sertbas

Gurgen

Delil

Apaydin

Kursun

Collection and analysis of a Parkinson speech dataset with multiple types of sound recordings

IEEE Journal of Biomedical and Health Informatics 2013 17 4 828 834

10.1109/jbhi.2013.2245674

2-s2.0-84885092956

15.

Singh

A. K.

Kumar

Dave

Mohan

Robust and imperceptible dual watermarking for telemedicine applications

Wireless Personal Communications 2015 80 4 1415 1433

10.1007/s11277-014-2091-6

2-s2.0-84907482318

16.

X. W.

Kim

S. T.

Optical 3D watermark based digital image watermarking for telemedicine

Optics and Lasers in Engineering 2013 51 12 1310 1320

10.1016/j.optlaseng.2013.06.001

2-s2.0-84880294993

17.

Bouslimi

Coatrieux

Cozic

Roux

A telemedicine protocol based on watermarking evidence for identification of liabilities in case of litigation

Proceedings of the IEEE 14th International Conference on e-Health Networking, Applications and Services (Healthcom ′12)

October 2012

506 509

10.1109/healthcom.2012.6379473

2-s2.0-84872016764

18.

Ali

Ahn

C. W.

An optimized watermarking technique based on self-adaptive DE in DWT–SVD transform domain

Signal Processing 2014 94 1 545 556

10.1016/j.sigpro.2013.07.024

2-s2.0-84882420855

19.

Lei

B. Y.

Soon

I. Y.

Blind and robust audio watermarking scheme based on SVD-DCT

Signal Processing 2011 91 8 1973 1984

10.1016/j.sigpro.2011.03.001

2-s2.0-79955475469

20.

Zhao

Wang

Chen

Liu

A robust audio watermarking algorithm based on SVD-DWT

Elektronika Ir Elektrotechnika 2014 20 1 75 80

10.5755/j01.eee.20.1.3948

2-s2.0-84892390687

21.

Muhammad

Automatic speech recognition using interlaced derivative pattern for cloud based healthcare system

Cluster Computing 2015 18 2 795 780

10.1007/s10586-015-0439-7

2-s2.0-84922511495

22.

Muhammad

Masud

Alelaiwi

Rahman

M. A.

Karime

Alamri

Hossain

M. S.

Spectro-temporal directional derivative based automatic speech recognition for a serious game scenario

Multimedia Tools and Applications 2015 74 14 5313 5327

10.1007/s11042-014-1973-7

2-s2.0-84901739338

23.

Peng

Wang

Zhang

Audio watermarking scheme robust against desynchronization attacks based on kernel clustering

Multimedia Tools and Applications 2013 62 3 681 699

10.1007/s11042-011-0868-0

2-s2.0-84880057644

24.

Xiang

Audio watermarking robust against D/A and A/D conversions

EURASIP Journal on Advances in Signal Processing 2011 3

10.1186/1687-6180-2011-3

25.

Huang

Shi

Y. Q.

Efficiently self-synchronized audio watermarking for assured audio data transmission

IEEE Transactions on Broadcasting 2005 51 1 69 76

10.1109/TBC.2004.838265

2-s2.0-15844424554

26.

Al-Haj

A dual transform audio watermarking algorithm

Multimedia Tools and Applications 2013 73 3 1897 1912

10.1007/s11042-013-1645-z

2-s2.0-84912047024

27.

Fallahpour

Megías

High capacity audio watermarking using the high frequency band of the wavelet domain

Multimedia Tools and Applications 2011 52 2-3 485 498

10.1007/s11042-010-0495-1

2-s2.0-79953026663

28.

Erçelebi

Batakçı

Audio watermarking scheme based on embedding strategy in low frequency components with a binary image

Digital Signal Processing 2009 19 2 265 277

10.1016/j.dsp.2008.11.007

2-s2.0-58549119125

29.

Y. J.

Shimamoto

A study on DWT-based digital audio watermarking for mobile ad hoc network

Proceedings of the IEEE International Conference on Sensor Networks, Ubiquitous, and Trustworthy Computing

June 2006

Taichung, Taiwan

IEEE

247 251

10.1109/SUTC.2006.19

30.

El-Samie

F. E.

An efficient singular value decomposition algorithm for digital audio watermarking

International Journal of Speech Technology 2009 12 1 27 45

10.1007/s10772-009-9056-2

2-s2.0-72149103538

31.

Bhat

K. V.

Sengupta

Das

An adaptive audio watermarking based on the singular value decomposition in the wavelet domain

Digital Signal Processing 2010 20 6 1547 1558

10.1016/j.dsp.2010.02.006

2-s2.0-77955414340

32.

Chang

C.-C.

Tsai

Lin

C.-C.

SVD-based digital image watermarking scheme

Pattern Recognition Letters 2005 26 10 1577 1586

10.1016/j.patrec.2005.01.004

2-s2.0-19744376440

33.

Ray

A. K.

Padhihary

Patra

P. K.

Mohanty

M. N.

Development of a new algorithm based on SVD for image watermarking

Computational Vision and Robotics 2015 332

New Delhi, India

Springer

79 87 Advances in Intelligent Systems and Computing

10.1007/978-81-322-2196-8_10

34.

Barry

W. J.

Pützer

Saarbrucken Voice Database, Institute of Phonetics, Saarland University, http://www.stimmdatenbank.coli.uni-saarland.de/

35.

Gordy

J. D.

Bruton

L. T.

Performance evaluation of digital audio watermarking algorithms

Proceedings of the 43rd IEEE Midwest Symposium on Circuits and Systems

August 2000

Lansing, Mich, USA

IEEE

456 459

2-s2.0-0034464180

36.

Katzenbeisser

Petitcloas

Information Hiding Techniques for Steganography and Digital Watermarking 2000

Norwood, Mass, USA

Artech House

37.

Wang

X.-Y.

Zhao

A novel synchronization invariant audio watermarking scheme based on DWT and DCT

IEEE Transactions on Signal Processing 2006 54 12 4835 4840

10.1109/TSP.2006.881258

2-s2.0-33947129581