Sage Journals: Discover world-class research

Abstract

The development of wireless acoustic sensor networks has driven the use of acoustic signals for target monitoring. Most monitoring applications require continuous network connectivity and data transfers, which can rapidly exhaust nodes’ energy. Consequently, sensors must collaborate in an adequate architecture to perform target recognition and localization tasks and then to send the results to a remote server with a reduced data volume. The design of an energy-efficient scheme that achieves acoustic target recognition and localization remains an open research problem. Accordingly, this article proposes a low-energy acoustic-based sensing scheme for target recognition and localization to be implemented in a cluster-based sensing approach designed to appropriately balance energy consumption and local processing performed by sensor nodes. A reduced set of low-complexity feature extraction methods in the time domain signal are used in the recognition process. The scheme uses the received energy of the acoustic signals for the target localization. This article details the network architecture, the scheme specification, and its implementation. The results show that the scheme can classify targets with 81.34% accuracy. It requires 3.2 mJ of energy when executed in MICAz, achieving 99% energy savings compared to streaming 3 s of an acoustic signal to a remote server.

Keywords

Wireless acoustic sensor networks feature extraction recognition localization low-energy processing

Introduction

Wireless sensor network (WSN) applications have experienced remarkable successes in recent decades and continue to attract the attention of researchers and industry. The primary factors driving the design of applications based on these systems are self-configuration, self-organization, easy deployment, and low costs. Nevertheless, the demands for multimedia-based applications have caused a shift from the use of traditional scalar sensors to sensors equipped with multimedia data-streaming capabilities. This shift has encouraged the adoption of wireless acoustic sensor networks (WASNs) consisting of smart microphone nodes capable of sensing, processing, and communicating audio streams. Acoustic sensors hold great potential for a wide range of applications due to their capability to provide abundant information. These systems are considered to be a promising infrastructure for many applications, particularly for hearing aids, ambient intelligence, and acoustic monitoring.

Target monitoring is among the most practical applications of WASN, in which sensor nodes are used to detect, identify, and locate acoustic targets in two-dimensional (2D) space. Acoustic source monitoring has great importance in environmental monitoring, home and office controls, and battlefield surveillance among many areas. Despite these benefits, most monitoring applications require continuous network connectivity and data transfer, which may hinder the widespread deployment of WASN. In-depth acoustic sensors often use high sampling frequencies of acoustic signals, inevitably resulting in high energy consumption during wireless transmission of the whole sensed acoustic signal. To avoid the problem of row acoustic data streaming to a remote server, which can rapidly exhaust nodes’ energy, an interesting alternative is to detect acoustic events locally and then report them to end users. In general, most acoustic target monitoring systems require the capability to both recognize and localize targets in the environment. In WASN, the success of an approach implementing these tasks depends on a sensor’s capability to perform accurate target recognition and to collaborate with other sensors to determine its location with minimal energy consumption. Whereas some previous works addressed acoustic-based target recognition,^1–9 other works studied approaches for target localization in WSN.^10–22 However, a solution that efficiently combines these capabilities in WASN has not yet been designed. Thus, a research work is still required to design a low-complexity, energy-aware recognition and localization scheme for WASN.

This research work is a new contribution in the area of acoustic-based sensing and target localization in WASN. It aims to specify and to design a new energy-aware sensing scheme to detect, recognize, and localize a target in a surveillance area using WASN. We extend our previous work²³ that focused on the design of an approach for target recognition intended to be used in habitat monitoring applications. The new proposed scheme is based on a low-complexity processing algorithms and communication techniques that reduce per-node energy consumption to prolong the network’s lifetime while guaranteeing the required application performance. The basic premise is to avoid data streaming required in classical acoustic–based sensing by detecting the event of interest at the source sensor before data transmission. Energy savings are achieved by local recognition of the target using optimized tasks for feature extraction from the received audio signal employed for classification. The received acoustic signal power measurements are then used for target localization employing energy-based localization techniques. The research work proposes, also, efficient task distribution over the sensors in the cluster-based architecture to achieve energy and time efficiency through a collaborative execution of the scheme.

The rest of the article is organized as follows. Section “Related work” reviews the related work, while section “General approach for target recognition and localization” describes the general approach for target recognition and localization in the low-energy acoustic sensing scheme. Then, it details the specification and the design of the proposed sensing scheme. Finally, the article discusses the results and the performance analysis before giving our conclusion and highlighting the future works.

Related work

Environmental monitoring is a fundamental problem that has increasingly attracted the attention of biologists in recent years. To optimize the management of ecosystems, animal recognition and localization has become an important research topic. Animal monitoring systems can produce valuable information that helps in protecting animals, planets, and overall natural environment. In this context, the use of WASNs for target recognition and localization has become an interesting low-cost approach for animal monitoring. Although these networks offer many advantages, there are several design issues and challenges associated with the deployment of the WASNs for monitoring purposes. Specifically, energy constraints, limited hardware processing resources, and the lack of efficient communication protocols represent problems associated with the use of WSNs that have not yet been entirely solved.²⁴

The majority of animal recognition algorithms described in the literature are based on the extraction of relevant features from an acoustic signal and then on the use of suitable classification algorithms to make a recognition decision. Among all existing audio features, mel-frequency cepstral coefficients (MFCCs) have been shown to be some of the most effective features in animal recognition systems. Noda et al.¹ deployed MFCC and linear frequency cepstral coefficient (LFCC) feature extraction methods to classify reptiles using two classification techniques: a support vector machine (SVM) and a k-nearest neighbor (kNN) algorithm, while Xie et al.² developed an intelligent recognition scheme to classify frog species based on six features that were extracted using several acoustic indices and cepstral coefficients. In other studies,^3–5 the authors adopted MFCC and other features such as LFCC and energy entropy, in combination with syllable features to classify anurans species. These studies proved that the selection of a mixture of different domain features possess better discrimination ability when used for classification tasks compared to individual features such as MFCC, LFCC, or both MFCC and LFCC combined. Xie et al.⁶ used a set of MFCC, linear predictive coding (LPC), and syllable features to recognize frog species using five different machine learning algorithms. Colonna et al.⁷ also deployed syllable feature extraction methods to classify anuran species using several time domain features, such as the signal energy and zero crossing rate (ZCR).

The schemes presented for these animal recognition systems were computationally expensive, since they required a large set of different high complexity features that involve implementing signal transformation to convert a signal from the time domain to the frequency domain before applying the extraction technique. In addition, Luque et al.⁸ claimed that syllable extraction is a highly complex task and is not suitable for a noisy environment; they suggested processing a waveform over successive frames as an alternative simple technique. However, despite the superior performance of the features proposed in the literature compared to time and frequency domain features, these schemes have seemingly insatiable demands for processing time, memory space, and computational power. In addition, most of the aforementioned animal recognition studies did not sufficiently address the transmission costs nor energy consumption of their proposed solution to demonstrate the feasibility of their approaches for a low-power sensor mote.

We believe that these schemes may not be adequate for real-time applications running on resource-constrained devices that provide timely data detection and transmission in order to locate and track a target of interest. Hence, it is important to fill this gap in the literature by designing an energy-efficient recognition scheme that best suits the specific requirements of sensor nodes. Applying a reduced set of low-complexity feature extraction methods in the time domain would be an efficient approach. These features may offer a simple and cost-effective solution, as they can be directly extracted from the original signal without any transformation.²⁵ ZCR and root mean square (RMS) are considered to be the most important and influential time domain features; thus, these features have been adopted by numerous applications, including speech–music discrimination,²⁶ sound/object classification,^27–29 and animal classification tasks.^5,9

As mentioned previously, we focused on using the average energy measurement of the detected acoustic signal to estimate a target’s location. Unlike the approach of locating a target using the time difference of arrival (TDOA)¹⁰ and direction of arrival (DOA),¹¹ this approach is not very demanding in terms of processing power and data transmission. TDOA-based approaches typically require strict time synchronization between multiple microphone pairs, which requires high sampling rates, increased computational power, and complicated infrastructure at each sensor.¹² Such approaches also require accurate acquisition of the phases of the signals arriving at different motes.¹³ Similarly, DOA-based localization methods require nodes equipped with multiple microphones, which should be capable of detecting the source signal and producing a DOA bearing estimate.¹⁴ Hence, compared to TDOA and DOA methods, energy-based approaches offer an attractive option since they are applicable to WASNs and do not require microphone pairs at each sensor, high precision hardware, or a synchronization mechanism.

Owing to their ease of implementation and low costs, many energy-based localization methods have been introduced to estimate the target positions using WASNs. These methods have been reviewed in several different studies^13–15 and include least square estimation (LSE),¹⁶ weighted least squares (WLS),¹⁷ projection onto convex sets (POCS),¹⁸ semidefinite programming (SDP),¹⁹ and convex second-order cone programming (SOCP) methods.²⁰ Meng and Xiao¹³ studied the localization accuracy of these methods and proved that LSE methods can have relatively low localization accuracy compared to other methods. Nevertheless, Steen et al.¹² have shown that LSE methods, particularly the quadratic elimination (QE) least square method, can provide better localization accuracy than other localization methods in the presence of noise.

We used the QE least square method¹⁶ in our approach to estimate the target location. The basic idea of QE is to pair the available signal power measurements to form hyperspheres. The circumferences of these hyperspheres intersect at the point in which the potential target may reside. Li and Hu¹⁶ first applied this algorithm in 2003 to localize a single target object using its acoustic signal power^17–21 and then adopted this algorithm to locate multiple target objects.²² A more recent development is the application of the QE method to track target objects during wind gusts and background noise.¹² To the best of our knowledge, this approach can provide a good trade-off between estimation accuracy and computational complexity compared to other localization techniques published in prior studies. In fact, most of the reported works are algorithm developments and have not examined the performance in terms of energy consumption. Therefore, the primary contribution of this article is the design of an energy-efficient scheme and its energy efficiency for WASN, which can fill the research gap in this area.

General approach for target recognition and localization

In the proposed solution, the network comprises a set of distributed nodes aggregated using a suitable clustering algorithm (Figure 1). The cluster’s membership depends specifically on the adopted clustering technique, with which the distance and energy of the nodes are usually considered. The clustering in the proposed acoustic network could be performed based on a low-energy clustering method such as low-energy adaptive clustering hierarchy or low-energy adaptive clustering hierarchy-centralized algorithms.³⁰ The scope of this article does not cover the clustering process; instead, it focuses on the design of the energy-efficient sensing scheme, and it proposes a distribution of the tasks of this scheme among the sensors of the cluster, to perform a collaborative execution of detection, recognition, and localization.

Figure 1.

Framework for the proposed target recognition and localization scheme in WASN.

The proposed smart edge-sensing strategy helps to restrict the transmission flow in the network and avoid transmitting the whole acoustic signal, which may contain useless information, to the end user. This approach is based on extracting a set of features, from the received signal, which are used for local classification and then transmitted to the cluster head (CH) when the target object is recognized. The target detection is based on a low-cost classification algorithm that matches the similarity between the newly extracted features’ vector and the previously stored reference descriptor.

During the network’s setup phase, sensor nodes are loaded with the target’s signature reference. A configuration packet is forwarded to CH nodes to be broadcast to all member nodes of the same cluster. Figure 1 illustrates a network with three clusters and a bird in the area covered by cluster 1. Each member node should periodically sense a new acoustic signal over a constant time interval (T). The average signal power during that time interval is measured by all the sensors (S1, S2, S3, and S4) receiving its acoustic signal. Each of these sensors compares the detected average powers against a predetermined threshold value to decide the detection of a new acoustic event.

When a sensor detects a new acoustic object, it locally performs the classification. Sensors that subsequently recognize the target notify the event to CH1. These notifications include the features’ vector and the received acoustic signal power (Figure 2(a)). If the received number of notifications is sufficient, CH1 estimates the target location and generates a notification, as requested by the end user. The notification packet can contain one of the following: a few bits’ notification, the features’ vector, or the target location and the features’ vector (Figure 2(b)).

Figure 2.

Flowcharts of the proposed sensing scheme for object recognition and localization at: (a) the sensor and (b) the cluster head level.

At CH1, estimation of the target location using all the received packets may not be the optimal way to save sensor energy. Indeed, the received packets may contain unreliable observations, usually dominated by high noise levels acquired by sensors far from the detected object. When these packets are considered in the localization, they decrease localization accuracy and increase energy consumption. To avoid this issue, CH1 should estimate the target location using a selected subset of the most informative measurements from the sensors closest to the target.

Nevertheless, most localization algorithms require a minimum number of observations for reliable location estimation. Therefore, CH1 should not perform the localization process unless a certain required number of measurements are received. This approach also guarantees that the localization task is not performed based on receiving a very few false detections caused by the target detection scheme’s uncertainty.

To reduce the volume of data transmitted to the remote server, the CH1 must select the features’ vector that comes from the sensor closest to the detected object, based on the received acoustic signals’ powers. The reduction in size of the notification data stream is intended to save further energy at the CH1, contributing to extending the network’s lifetime.

In fact, the proposed approach contributes to energy saving at different levels. It reduces computation overhead by deploying low-complexity feature extraction methods, a low-complexity classification method, and an energy-based localization technique. Furthermore, it uses, for the sensing scheme, a cooperative processing approach across the cluster-based network architecture (Figure 2) that balances the processing between the member sensors of the cluster. Also, among the set of packets communicated to CH1, a reduced number of signal power measurements are used to locate the target, which saves energy at the CH1.

During the notification to the end user, only a reduced number of packets that carry relevant data will be transmitted by the CH1, which reduces the radio transceiver activity. The reduced amount of data avoids the need for significant memory and storage space in the sensors. Therefore, considering the previously mentioned benefits, we think that the proposed sensing approach will significantly reduce the per-node energy consumption, extending the network’s lifetime. However, the validity and the efficiency of this approach will depend significantly on the designer’s ability to develop energy-efficient acoustic signal processing algorithms that ensure the objectives of the scheme.

Tasks’ specification of the proposed sensing scheme

As discussed previously, the efficiency of the proposed approach depends on the success of the adequacy between the implemented tasks of the scheme and the sensor resources on one hand and the performances at the application level on the other hand. The algorithms’ complexities related to the features’ extraction, object classification, and target localization should be energy efficient to ensure extended network lifetimes. The two main factors that should be considered in the design process are as follows:

Network resource constraints: data reduction is an effective approach to increase storage efficiency, to optimize bandwidth utilization, and to maximize energy efficiency. Therefore, it is necessary to define a low-dimensional vector of features that can represent a specific object uniquely using low computation cost extraction techniques. Moreover, the adopted classification model should be capable of producing predictions with the minimum number of predictor variables that achieve high classification accuracy. The localization algorithm should incur, on average, few sensor measurements and transmissions while preserving an acceptable level of accuracy.

Computational complexity: multimedia applications in WSNs often produce a huge volume of data that require deploying in-network data processing techniques in order to reduce the communication burden.²⁴ These methods contribute to decreasing the overall energy cost of wireless communication. However, the adopted algorithms must be computationally light with few mathematical operations that would reduce the number of clock cycles needed by the sensor processor to execute an algorithm and consequently decrease energy demands for data processing.

The structure for the proposed target recognition and localization scheme for acoustic-based sensing is represented in Figure 2. It is composed of several components that achieve the recognition and localization of the target.

Sampling period

In the proposed sensing scheme, the sampling period $T_{S}$ should be larger than $T_{Smin}$ , which corresponds to the minimum time required to record the acoustic signal and to execute the scheme comprising detection, classification, localization, and notification to the remote server. If we consider a record of 3 s and we consider also the results presented in the performance analysis at the application level section, the value of $T_{Smin} ~ 3.0016 s$ . In practical, the period $T_{S}$ for animal monitoring could be defined as 5 or 10 s according to the expected frequency of the animal’s appearance in the area of surveillance. We note that this period of sampling may be uploaded in the sensor during the setup process using the broadcasted packet for configuration.

Object detection

During the setup phase, each sensor acquires multiple new acoustic signals over a time interval (T). Each sensor computes the RMS time domain feature for every sensed signal and measures for all these acquired signals the average signal power $y_{i_back} (T)$ . This average signal power is considered background noise that will be updated in time. During the monitoring phase, the sensors in each cluster should periodically acquire a new acoustic signal in a timely manner $(Δ t)$ for time duration (T). Every sensor monitors changes in the environment by checking if the power measurement of sensed acoustic signal exceeds the background noise with a certain fraction. If it exceeds this threshold, a new object is considered detected (D = 1). The input audio signal is then passed to the following stages for feature extraction and object recognition. If no object is detected, the power of the new acquired signal is considered to be the new value of the background noise. The average signal power is updated in a quasi-periodic manner $(y_{i_back} (T) : = y_{i} (T))$ to minimize the sensitivity to the changes in the environment. We can define the detection function (D) as

D = {\begin{matrix} 1 & if y_{i} (T) > y_{i_back} (T) + y_{i_back} (T) α \\ 0 & if y_{i} (T) \leq y_{i_back} (T) + y_{i_back} (T) α \end{matrix}, 0 \leq α \leq 1

(1)

The object recognition phase comprises three main stages, which are signal framing, feature’s extraction, and object classification.

Signal framing

Signal framing is often a crucial pre-processing step applied to the acoustic signal before any further processing. Normally, an acoustic signal is not stationary, hence, the signal needs to be split into short-time segments in order to achieve stationarity. However, it is necessary to avoid any loss of information at the edge of frames when the signal is framed. Hence, it is important to divide these frames into a series of consecutive overlapping frames. Typical frame size in any signal-processing task has an overlap of 50% of the frame size.³¹ The first step in the proposed recognition scheme (Figure 3) is to divide the continuous signal of $(M)$ samples into equal-sized short-term windows $(K)$ ; each frame has 1024 samples with 50% overlap between consecutive frames. After this step, each individual frame is passed into different features’ extraction procedures.

Figure 3.

General scheme for features’ extraction process.

Features’ extraction

Features’ extraction is the basic processing step in the object recognition scheme as it helps to distinguish the target object from different objects with a relatively similar signal wave structure. Furthermore, in our approach, it is an efficient technique to reduce data for memory and bandwidth adequacy. In depth, the generation of small packets, containing the extracted features from the signal, to be transmitted upon the target detection, can reduce the communication overhead in the cluster and contribute to save energy.

For the features’ selection stage, we examined various types of feature extraction methods that have been adopted by similar works^1–9 in the literature for animal classification purposes. These features’ extraction techniques can be divided into the following four categories:

Time domain features such as ZCR, short-time energy, sound amplitude, and peak detection.

Frequency domain features such as spectral centroid (SC), spectral roll-off (SR), bandwidth, and spectral flux.

Linear predictive cepstral coefficients.

MFCCs.

An extensive study of the algorithmic complexity and execution time of the method extracting these features has been conducted in different works.^9,25 These characteristics are summarized in Table 1.

Table 1.

Feature extraction method computational complexity and execution time.

Feature	Complexity	Executiontime (ms)
Time domain features	O(N)	1
Frequency domain features	FFT: O(Nlog2N)	15
	Frequency function: O(N)
LPCC features	O(Nlog2N)	30
MFCC features	O(Nlog2N)	78

FFT: fast Fourier transform; LPCC: linear predication cepstral coefficients; MFCC: mel-frequency cepstral coefficients.

From this table, we can conclude that time domain features are the least computationally intensive features and can be executed in a short time. While frequency domain features require transforming the signal to the frequency domain first using the fast Fourier transform (FFT), which have the complexity of $O (Nlog 2 N)$ , and then applying a method to extract features having in general a complexity $O (N)$ . On the contrary, both LPCC features and MFCC features are required to transform the signal to the frequency domain before applying an algorithm that has $O (Nlog 2 N)$ as a complexity. The execution times of these methods to extract the features from the acoustic signal are higher than the execution time of the features’ extraction in time domain.

The deployment of acoustic time domain features is sensitive to environmental noises such as blowing wind.³² Therefore, many works have been proposed to combine the temporal features with more sophisticated frequency domain features in order to provide a robust solution against environmental noises.

We studied the behavior of four feature extraction techniques that belong to the time and frequency domain of signal: RMS, ZCR, SC, and SR. The selection of these features has been done based on similar works⁵ that adopted these features to classify the acoustic signal into a specific animal class label. The classification performance of these features is measured using a collection of records belonging to 16 animals: cat, cow, dog, donkey, horse, parrot, sheep, buffalo, elephant, fox, leopard, lion, snake, tiger, vireo, and wolf. In our experiment, we extracted the selected four features from each record, and then we calculated the mean for each feature per class, as illustrated in Figure 4. The chart shows a similar sampling distribution for the mean value of the ZCR, SC, and SR features for different animal classes. Hence, we can infer that the three features can provide the same discrimination capabilities in the classification decision. However, computing the SC and SR requires higher computational cost and execution time compared to ZCR, although performing equally the same in the classification task. In addition, ZCR and SC can both be used to reflect the spectral shape of the acoustic signal.

Figure 4.

Mean distribution for the: (a) time and (b) frequency domain features.

For this reason, RMS and ZCR³³ were selected to represent the target and to identify the detected object.

RMS: it is an effective feature to capture characteristics of the overall power of the audio signal. The RMS parameter is calculated according to the formula

RMS = \sqrt{\frac{1}{N} \sum_{i = 1}^{n} x_{i}^{2}}

(2)

where $x_{i}$ is the ith sample value of the frame, and $N$ is the frame length.

ZCR: this feature reflects the spectral shape of the audio signal. It occurs when adjacent signals have different signs within a time window, which can be expressed as

ZCR = \frac{1}{2 (N - 1)} \sum_{m = 1}^{N - 1} | sgn [x (m + 1)] - sgn [x (m)] |

(3)

where $x (m)$ is the mth sampled signal and $sgn [\cdot]$ is the value of the sign function, which is expressed as

sgn [x (n)] = {\begin{matrix} 1 & x (n) \geq 0 \\ - 1 & x (n) < 0 \end{matrix}

(4)

The extraction of acoustic signal features follows three main steps as shown in Figure 3:

Removing silent frames: the recorded signal may include silent frames that do not contain appropriate features, which affect the quality of the extracted features. Hence, silent frames should be removed in order to improve the accuracy of the target recognition system. In the proposed silence-removing approach, a typical silent frame has an RMS value that is lower than 10% of the whole signal’s overall average power. Thus, the silence threshold value $(S_{thre})$ is calculated using equation (5) based on the measurement of the signal power $y_{i} (T)$ during the object detection task as follows

S_{thre} = y_{i} (T) \times 0.10

(5)

If the $(RM S_{i})$ value for a given frame $(fram e_{i})$ is less than $(S_{thre})$ , the frame will be removed and the following frame will be processed. The silence detection function (S) is defined as follows

S = {\begin{matrix} 1 & RM S_{i} > S_{thre} \\ 0 & RM S_{i} \leq S_{thre} \end{matrix}

(6)

Extracting features for each frame: in this step, silence frames are discarded, and all the rest are passed through the ZCR feature extraction algorithm in order to generate a $(ZC R_{i})$ value for each frame.

Deriving feature vector: in the last stage, the vector of features that represents the full acoustic signal is constructed. Basically, the extracted features from all the (K) frames are combined to compute the mean value for each feature, and then concatenated into one feature vector { $RM S_{i}$ , $ZC R_{i}$ } where $i = 1, 2, . ., K$ . This vector is then fed into the adopted classifier to generate the recognition decision.

Object classification

In the proposed scheme, we adopted the minimum mean distance (MMD) classifier³⁴ to recognize the target object. The main idea is to find the minimum Euclidean distance between the unknown object features’ vector and the target object signature vector. The advantage of the MMD classification algorithm is its model simplicity and its short execution time. In previous works, the MMD classifier has been commonly and successfully applied in acoustic target recognition systems.³⁵ In the proposed scheme, our emphasis will be on the development of a habitat-monitoring acoustic sensing approach that recognizes and locates a specific animal.

After extracting different features from the test data, we have noticed that there is a significant overlapping in the features’ vectors of different animals for the two features (RMS and ZCR), as shown in Figure 5. In fact, this figure shows that in the distribution of RMS and ZCR for the set of audio tests representing the dog and the cow, the values of ZCR and RMS overlap. The same is also true for the distribution of the ZCR and RMS extracted from the audio signals of the cat, donkey, and horse. This overlapping in the feature values makes it difficult to perform efficient recognition.

Figure 5.

Vector of feature distribution for an animal dataset.

In order to define class boundaries correctly, the learning model needed an extended set of training recorders to capture the underlying structure of the data. However, with the limited number of records, it was very difficult to label the detected object to one target with high recognition accuracy. To overcome this problem, we developed a multi-label approach for efficient classification in which each detected object is classified as one of the two most matching classes (Figure 6). The couple of classes that is the output of the classifier will be notified to the end user for further accurate classification that could be based on deep learning process.

Figure 6.

Multi-label classification process.

This method of classification is performed into the following two phases:

Pre-classification phase: it is an offline task that is implemented at the end user level in which the classes are defined for each target object as illustrated in Figure 7. Basically, each class of animals is represented by a single vector of features extracted from different training records that belong to the same animal class. First, the feature’s vectors containing RMS and ZCR features are extracted from the records. Then, for each feature, we calculate the mean value for all the extracted features. The target class signature and all the classes’ signatures will then be loaded into the memory of the acoustic sensors during the system setup phase, using a broadcasted packet that contains all the classes’ signatures.

Classification phase: the classification process will be performed during the runtime at each sensor using the MMD classifier. Basically, the classifier measures the distance between the newly extracted object features’ vector $f_{i}$ and every class features’ vector ${f_{i}}^{'}$ , and then stores the results in the distance vector $D_{i} = {d_{i}, \dots, d_{N}}$ . Then, using the Euclidean distance metric,⁵ the two classes with the shortest Euclidean distance are selected as the matching target class labels. The Euclidean distance metric can be expressed as

d (f_{i}, {f_{i}}^{'}) = \sum_{i = 1}^{D} {| f_{i} - {f_{i}}^{'} |}^{2}, i = 1, \dots, D

(7)

Figure 7.

Creation of object classes’ process.

Object notification

The data communication in the network may have a major impact on the energy consumption levels in the WSNs. Thus, reducing the communication overhead in the notification process while considering the application requirements could significantly save energy. In the proposed scheme, a sensor node will assemble a notification packet according to its role in the cluster, which can be either a member sensor or a cluster head. In general, we adopted three notification types that might correspond to different application requirements.

To ensure the required exchanges in the proposed sensing scheme, different types of packets were specified (Figure 8). These packets will be used for configuration, notification, and acknowledgment purposes. The main packets were specified to ensure the required exchanges for the function of the proposed scheme. The configuration packet $(config_pack ())$ is used to load the target’s signature as a reference in sensor node memory. While CH notification $(detect_pack ())$ is sent upon the detection of target object presence in the area of deployment. The notification message can be one of the following: few bits’ notification, vector of features, or vector of features and signal average energy. On the contrary, the end user notification $(notif_pack ())$ will be sent once the number of received detection packets is above a certain threshold. The notification can be either a few bits’ notification, vector of features, or vector of features and target position.

Figure 8.

Exchanges in WASN for object recognition and notification.

Object localization

We adopted a lightweight acoustic source localization approach suitable for resource-constrained sensor nodes. The localization algorithm is based on the power of the received signals that is measured using the energy decay model¹⁶ based on the inverse square law principle.³⁶ The main concept in this model is that the signal intensity emitted omni-directionally from an acoustic source is attenuated as the signal propagates toward the destination sensor node.

Consider a network composed of $(N)$ stationary sensors, with known locations, deployed in a field in which an omni-directional source emits an acoustic signal $s (T)$ that propagates through the free air along the ground surface. During each time duration $(Δ t)$ , the ith sensor estimates one signal power reading $y_{i} (T)$ by averaging a number of sample points $(M)$ over a time window $(T)$ . The energy-based approach estimates the source location $\hat{r} (T)$ using multiple signal power observations $y_{i} (T)$ , where $1 \leq i \leq N$ at different known sensor locations. The acoustic signal received by the ith sensor at a fixed time duration $(T)$ , denoted by $y_{i} (T)$ can be modeled as

y_{i} (T) = g_{i} \frac{s (T)}{{| \hat{r} (T) - r_{i} |}^{α}} + ε_{i} (T), i = 1, \dots, N

(8)

where $g_{i}$ is the gain factor of the ith acoustic sensor; $s (T)$ is the intensity of the source signal computed over the observation time period ( $T$ ) at 1 m away from the source. In the measurement model, $g_{i}$ is assumed known from calibration and $s (T)$ is not known; $\hat{r} (T)$ and $r_{i}$ are $d \times 1$ vectors indicating the Cartesian coordinates, respectively, of the source, and the ith sensor during the time interval (T). The factor $α$ is an energy decay factor (path exponent), whose typical values are $2 ~ 4$ ( $α = 2$ in free space, $α > 2$ in the presence of reflections and reverberations).¹⁶ While $ε_{i} (T)$ represents the cumulative effects of the modeling error of $g_{i}$ , $r_{i}$ , and $α$ parameters and the additive observation noise of $y_{i} (T)$ , which can be approximated using a Gaussian normal distribution.

A QE¹⁶ method is formulated as a nonlinear optimization problem to solve the location of a single acoustic source within a 2D plane. The basic idea of QE is to group acoustic signal power measurements of a pair of sensors to form hyperspheres. The estimated location of the target can be obtained by taking the intersection of these hyperspheres on which the potential target $\hat{r} (T)$ may reside. This approach takes ratios of measurements in order to solve the localization problem with the unknown source of signal $s (T)$ . By taking the ratio $k_{ij}$ associated with sensor $i$ and $j$ , we can eliminate the $s (T)$ parameter in the energy decay model equation, as shown in equation (8). Each power ratio defines a circle that specifies the likelihood of target location $\hat{r} (T)$ . For a cluster of (N) sensor nodes, we can compute $N (N - 1) / 2$ pairs of ratios. The first step of this solution is to rewrite the energy decay model in another form, and then to approximate the noise term $ε_{i} (T)$ by its mean value $α_{i}^{2}$ as

\frac{y_{i} (T) - μ_{i}}{g_{i}} = \frac{s (T)}{{| \hat{r} (T) - r_{i} |}^{α}}

(9)

then, the energy ratio $k_{ij}$ can be computed as follows

k_{ij} : = {(\frac{[y_{i} (T) - μ_{i}] / g_{i}}{[y_{j} (T) - μ_{j}] / g_{j}})}^{- 1 / α} = \frac{\frac{s (T)}{| r - \hat{r} {(T)}_{i} |}}{\frac{s (T)}{| r - \hat{r} {(T)}_{j} |}} = \frac{| r - \hat{r} {(T)}_{i} |}{| r - \hat{r} {(T)}_{j} |}

(10)

{k_{ij}}^{α} {| \hat{r} (T) - r_{i} |}^{α} = {| \hat{r} (T) - r_{j} |}^{α}

(11)

The set of all possible points $r (T)$ whose coordinates $x$ and $y$ satisfy equation (11) must reside on the hypersphere described by the following equation

{| \hat{r} (T) - c_{ij} |}^{α} = ρ_{ij}^{α}

(12)

where the center $c_{ij}$ and radius $ρ_{ij}$ of this hypersphere are expressed as

c_{ij} = \frac{r_{i} - {(k_{ij})}^{α} r_{j}}{1 - {(k_{ij})}^{α}}

(13)

ρ_{ij} = \frac{k_{ij} | r_{i} - r_{j} |}{1 - {(k_{ij})}^{α}}

(14)

Based on these hyperspheres, a hyperplane will be formed by taking the intersection of two different pairs of circles ( $k_{ij}$ and $k_{kl}$ ). We subtract each side in order to eliminate the term $| r |^{α}$ as follows

\begin{matrix} | r |^{α} = 2 c_{ij}^{T} r + ρ_{ij} - | c_{ij} |^{α} \\ | r |^{α} = 2 c_{kl}^{T} r + ρ_{kl} - | c_{kl} |^{α} \end{matrix}

(15)

\begin{matrix} (c_{ij} - c_{kl}) |^{T} r = \frac{1}{2} [(| c_{ij} |^{α} - ρ_{ij}^{α}) - (| c_{kl} |^{α} - ρ_{kl}^{α})] \\ = θ_{ij} - θ_{kl} \end{matrix}

(16)

denoting the theta $θ_{i}$ of each hypersphere is defined by

θ_{ij} = | c_{ij} |^{α} - ρ_{ij}^{α}

(17)

By pairing all hyperspheres, $P$ hyperplanes can be derived. These $P$ hyperplanes are obtained by constructing matrix $a$ and vector $b$ . Matrix $a$ is $P \times 2$ matrix $a = [a_{1}, \dots, a_{p}]^{T}$ , such that each row contains a vector of $c_{ij} - c_{kl}$ , while vector b is $P \times 1$ vector $b = [b_{1}, \dots, b_{p}]^{T}$ that contains $θ_{ij} - θ_{kl}$ . The set of hyperplane equations yield a cost function, which is minimized to obtain a source location $r (t)$ .

Solve r such that | Cr - θ |^{α} is minimized

(18)

which can be estimated by solving the following unconstrained QE least square problem

\hat{r} (T) = (a^{T} a)^{- 1} a^{T} b

(19)

The general scenario considered for object localization using the QE method is depicted in Figure 9. The target is supposed to appear at a position (4.58, 2.66) in a cluster that is covered by a set of wireless acoustic sensors. Once the target is recognized, the surrounding sensors will measure the acoustic signal power $y_{i} (T)$ . If the target is detected, the measurements $y_{i} (T)$ will be transmitted to the CH. Then, the CH will use, at least, four measurements of the signal power readings to calculate the power ratio $k_{ij}$ between every pair of sensors. These ratios will be used to compute the center $c_{ij}$ and radius $ρ_{ij}$ of all circles. Thus, the CH will form three different circles corresponding to three independent equations. These circles intersect at the target position $r (T)$ , as shown in Figure 9.

Figure 9.

Concept of quadratic elimination term localization.

Meng and Xiao¹³ have conducted an extensive study that examines the complexity of different energy-based localization methods.^10–22 The reported computation complexity of different localization methods shows that LSE-based methods have a lower computation complexity compared to a WLS-based method, which has the complexity of $O (N^{2})$ , SDP and SOCP both have a complexity of $O (N^{3.5})$ , while the complexity of the POCS method is mainly determined by its convergence speed. The results showed that LSE-based methods provide a low-cost solution to locate the target with an acceptable accuracy. Hence, the adopted LSE technique, particularly the QE least square method, can provide a lower computational cost compared to the other works introduced in the literature.

Implementation and performance analysis

The proposed scheme was implemented with the MATLAB tool³⁷ to evaluate its efficiency at the application level. For this purpose, several experiments were conducted to test the capability and accuracy of the proposed scheme to recognize and locate the target object successfully. Furthermore, we used the AVRORA tool³⁸ to evaluate the suitability of the scheme for low energy consumption and its effectiveness in reducing per-node’s energy consumption.

Performance analysis at the application level

Object recognition

To measure the success rate of the recognition process, we conducted several experiments using different audio records for a specific object class, each recorded under differing conditions. We mainly focused on evaluating the capabilities of the classifier to label a specific detected object to a multi-class target, assuming that only one target object is expected to appear in the area covered by the cluster. The classification model was tested using various animal records collected from multiple sources. Most of the records were obtained from Animals Birds Sound Effects CD.³⁹ The remaining records were collected from various open sound libraries. The dataset consisted of 114 animal audio recordings comprising seven animal classes (cat, cow, dog, donkey, horse, parrot, and sheep). These animals were grouped according to their habitat. An example of the records is presented in Figure 10, which represents multiple wildlife sample sounds of a cat and a dog.

Figure 10.

Audio samples for two animals: (a) cat and (b) dog.

These animal calls were plotted before pre-processing the audio signal. Figure 10 shows that the signal was composed of some silent episodes between consecutive calls, which has to be removed to achieve efficient recognition. The collected records have different sampling rates and were saved in various file formats. Each record has variable recording durations and different levels of sound quality. In fact, some of these records were very noisy due to the environmental noises associated with the outdoor environment, such as wind, rain, and other animals. To fill these gaps, the records had to be pre-processed prior to the feature extraction stage.

In our solution, the pre-processing stage comprised three steps as illustrated in Figure 11, which were data cleaning, record segmentation, and signal re-sampling. These pre-processing steps were necessary to maintain acceptable classification accuracy and to avoid any bias in the recognition task. All records with poor sound quality were removed, and the remaining records were segmented into short equal length samples of 1–2 s. These records were then re-sampled at a unified sampling frequency of 44.1 kHz rate in a 16-bit wave format.

Figure 11.

Audio record pre-processing steps.

Reference vector extraction and recognition accuracy

This step of learning model represents a basic task in the recognition process. A unique signature (one vector per animal type ( $i$ )) for each class of animal was generated from the training records. This vector of features comprises two time domain features {RMS, ZCR}. As explained previously, we first extracted the vector of features from all the training animal records. Figure 12 shows the distribution of RMS and ZCR features generated from these recordings. We note that there was a large overlap between the feature values of different animals. For example, for RMS feature values, there was an overlap between {cow, dog, parrot, and sheep} and another overlap between {cat, donkey, and horse}. Then, for each class of animal, all the generated vectors from the samples were combined to compute the mean value for each feature in the vector. In other words, each feature in the target descriptor represents the mean value of all the features extracted from different samples belonging to the same object class, $Re f_{i} = {μ_{RMS}, μ_{ZCR}}$ . These descriptors were then loaded into the sensor memory during the setup process. During the classification process, these descriptors were used to recognize new extracted object vectors.

Figure 12.

Mean value of the time domain features: (a) RMS and (b) ZCR feature values.

The capability of the scheme to successfully recognize the target was evaluated by measuring the number of correct predictions made by the classifier using the following metric

Accuracy = \frac{N_{C}}{N_{S}} \times 100

(20)

where $N_{C}$ is the number of correctly classified sample records, and $N_{S}$ is the total number of samples.

In our approach, we used MMD as a local classifier at the sensor level to classify the target object into one of the two classes. Table 2 illustrates the recognition ratio with successful classification for different animals. This table shows only five animal classes along with the overall average accuracy for all animal classes. For comparison purposes, the table also shows the classification results for other commonly used classifiers: the Gaussian mixture model (GMM), SVM, and decision tree (DT).

Table 2.

The recognition performance using four different classifiers.

Animal	MMD	GMM	SVM	DT
Cat	0.59	0.78	0.53	0.75
Cow	0.73	0.46	0.26	0.52
Dog	0.82	0.77	0.53	0.49
Donkey	0.92	0.67	0.37	0.82
Horse	0.79	0.55	0.52	0.47
Parrot	0.89	0.78	0.72	0.43
Sheep	0.86	0.64	0.82	0.76
Overall average	81.34	74.62	59.93	67.43

MMD: minimum mean distance; GMM: Gaussian mixture model; SVM: support vector machine; DT: decision tree.

From these results, it was found that the MMD classifier achieved better recognition results than the other classifiers, obtaining a total classification accuracy of 81.34%. The results presented in the table also showed that the GMM classifier was capable of outperforming the SVM and DT classifiers, gaining 74.62% recognition accuracy.

Object localization

We also investigated the capability of the proposed solution to accurately estimate the target location using the received acoustic signal power measurements. In the proposed architecture, the CH receives signal power measurements from its member sensors in the cluster that will be used to evaluate the location. Sets of experiments using different sensor distributions were conducted to localize one specific target in the surveillance area. For this study, we have adopted Monte Carlo simulations for a single target location using the MATLAB tool.³⁷ We were mainly interested in evaluating the performance of the QE algorithm for three case studies. Due to uncertainties in the energy-decay model, the proposed assumptions and parameter values are subject to error. Thus, in the first case, we explored the impact of the sensitivity of different model’s parameters on the accuracy of the location estimation. In the second case, we analyzed the capabilities of the QE algorithm to estimate the target position with varying numbers of sensor nodes in the cluster. In the third case, we investigated the effectiveness of using a reduced number of signal power measurements to obtain a reliable estimation of the target location.

The performance of the proposed method was verified using the location estimation error metric, which denotes the difference between the true target location and the estimated location. For this purpose, the root mean square error (RMSE)¹² defined by equation (21) was used to evaluate the accuracy of the location estimations obtained, considering all the numbers of Monte Carlo trails

RMSE = \sqrt{\sum_{i = 1}^{N} \frac{{| {\hat{r}}_{i} - r_{i} |}^{2}}{K}}

(21)

where $r_{i}$ represents the estimated position of target location $r_{i}$ during the ith trail, and $K$ is the total number of Monte Carlo trails in one experiment.

The parameters used in the Monte Carlo simulation to evaluate the performance of the proposed scheme are shown in Table 3. The reduced set of studied simulation scenarios has been adopted by similar research studies,¹⁶ and we believe that these parameters are the most relevant parameters for studying the performance of the energy-based decay model. We have considered several static sensors $(N)$ that are placed randomly and uniformly in a 2D region of interest covering an area of $(100 \times 100) m^{2}$ . We have also considered a static acoustic target, whose location is randomly chosen within this region. The signal power measurement $y_{i} (T)$ of the acoustic signal is generated according to the energy decay model, where we have assumed fixed values for $α = 2$ , $s = 100$ , and $g_{i} = 1$ (where $i = 1, \dots, N$ ). The noises were randomly generated according to a normal Gaussian distribution. In this experiment, we considered $ε_{i} (T) = 0$ . The number of samples used was $M = 100$ , which was enough to achieve sufficient estimation accuracy. We conducted 1000 Monte Carlo simulation runs for each experiment.

Table 3.

Simulation parameters and configuration.

Parameters	Value
Simulation area	100 m × 100 m
Deployment type	Uniform random
Number of sensor nodes	100 and 20
Number of targets	1
Sensors gain factor	1
Energy decay factor	2
Source energy	100
Number of samples	100
SNR level	$\infty$ , 20, 40, 60, and 70 dB
Number of trails	1000

SNR: signal-to-noise ratio.

We investigated the sensitivity of the location estimation to different parameters. We analyzed the robustness of the localization method with respect to the variation in energy-based decay model parameters. The set of parameters considered in this study included the deviation of the real values of sensor location (r), the gain calibration factor (g), and the decay factor (α). Seven experimental configurations were considered in the simulations, as listed in Table 4. In the first configuration, the experiment was conducted with no noise or parameters’ variation, while the sensor’s location (r) was subject to a random fluctuation in the coordinate pair of configurations 2 and 3, respectively. In the fourth and fifth configurations, the sensor gain varied from its assumed fixed value $(g = 1)$ with $Δ g = 0.5$ and $Δ g = 1$ , whereas in configurations 6 and 7, the decay exponent (α) varied from 2 with $Δ α = 0.5$ and $Δ α = 1$ .

Table 4.

RMS error for different parameter settings.

Configuration number	$Δ α$	$Δ r$	$Δ g$	SNR(dB)	RMSerror (m)
1	0	0	0	$\infty$	0.00
2	0	0.5	0	$\infty$	1.12
3	0	1	0	$\infty$	2.61
4	0	0	0.5	$\infty$	30.05
5	0	0	1	$\infty$	35.79
6	0.5	0	0	$\infty$	1.94
7	1	0	0	$\infty$	5.27

SNR: signal-to-noise ratio; RMS: root mean square.

Due to the large number of Monte Carlo iterations (1000 runs), it was very difficult to show the results for all these iterations in one figure. Thus, for each configuration, the estimation results of only 20 iterations are presented (Figure 13). The impact of variation on sensors’ locations with values 0.5 and 1 is depicted in Figure 13(a) and (b), respectively. This figure shows that the accuracy of location estimation of the target was not heavily affected by the variation in the sensors’ locations for the values $Δ α = 0.5 and 1 m$ . In fact, the RMSE was 1.12 and 2.61 m, respectively, for these two values of $Δ r$ (Table 4), which could be interpreted by a low sensitivity of the localization method to slight errors in the sensors’ locations.

Figure 13.

Impacts of variations in model parameters on localization accuracy: (a) Δr = 0.5, (b) Δr = 1, (c) Δg = 0.5, (d) Δg = 1, (e) Δ $α$ = 0.5, (f) Δ $α$ = 1, (g) SNR = 60, (h) SNR = 40, and (i) SNR = 20.

Figure 13(c) and (d) depicts errors in the location of the target with the variation in gain calibration factor (g) of the energy-based decay model with the values 0.5 and 1, respectively. The results in this figure show that the errors in the target location estimation are very significant with the variation of factor (g). Table 4 shows that the RMSE is very high for these values of variation in the gain calibration factor. The results in Figure 13(e) and (f) show that errors in the location of the target were not important with small changes in the value of the decay factor (α).

We investigated also the sensitivity of the location estimation in the presence of noise. Our aim in designing configurations 1 to 20 is to study the effects of different signal-to-noise ratios (SNRs) on the location estimate, which may also be affected by inaccurate measurements of parameters, as listed in Table 5. The impact of noise on the accuracy of the target location is illustrated in Figure 13(g)–(i) for the values of 60, 40, and 20 dB, respectively. This figure demonstrates that the accuracy of the QE localization method is very sensitive to noise. In particular, for SNR = 20 dB, the RMSE(s) values are greater than 40 m, reflecting a very low accuracy in the target location.

Table 5.

RMS error for different parameter settings in the presence of noise.

Configurationnumber	$Δ α$	$Δ r$	$Δ g$	SNR (dB)	RMS error (m)
1	0	0	0	70	0.03
2	0	1	0	70	3.36
3	0	0	1	70	36.75
4	1	0	0	70	5.15
5	1	1	1	70	35.06
6	0	0	0	60	0.62
7	0	1	0	60	1.83
8	0	0	1	60	38.06
9	1	0	0	60	5.16
10	1	1	1	60	35.39
11	0	0	0	40	18.24
12	0	1	0	40	18.07
13	0	0	1	40	34.88
14	1	0	0	40	17.87
15	1	1	1	40	37.14
16	0	0	0	20	44.16
17	0	1	0	20	44.00
18	0	0	1	20	43.84
19	1	0	0	20	43.84
20	1	1	1	20	44.04

SNR: signal-to-noise ratio; RMS: root mean square.

Nonetheless, the configurations of SNR = 20 dB and SNR = 30 dB would correspond to very noisy environments, and in practical cases we expect to get higher level of SNR. Specifically, for high SNR values (greater than 60 dB), and without variation in the sensor gain calibration factor, the RMS error is low, which proves the efficiency of the QE method for target localization. We should note that the noise in the environment affects the power of the received acoustic signal at the sensor level, which not only reduces the localization performance of the energy-based localization methods but also affects the localization performance of the phase-based localization (TDOA¹⁰ and DOA)¹¹ methods.¹⁸ From Table 5, we can also note that the variation in the gain calibration factor impacts severely the accuracy of the target location estimation. Therefore, we can infer that precise gain calibrations and higher SNR values are critical success factors for the localization method in our scheme.

As expected theoretically, the localization error decreased as the number of deployed sensor nodes increased in the area. Figure 14 shows that the QE algorithm performed better when the cluster had more than 10 sensors. This figure also shows that the algorithm consistently performed almost as well when the number of sensors was equal to 20 or above. In fact, despite increasing the number of deployed sensors in the cluster (more than 20), the QE method did not offer significant benefit in terms of improving localization accuracy. However, the high density of nodes in the cluster would increase the classification rate.

Figure 14.

Impacts of different numbers of sensors on localization accuracy.

Optimizing the number of acoustic signal power measurements

Intensive data processing is time- and energy-consuming and can severely reduce the sensor lifetime and the application viability. Most real-time monitoring applications require delay-bounded transmission of data. Besides radio transmissions, the internal data processing represents a significant overhead in time and energy. Thus, reducing data processing at the cluster head level would contribute to saving energy and meeting time constraints.

We have studied the optimal required signal power measurements that the CH has to consider for less-processing and accurate localization of the target. For this purpose, the simulation input parameters presented in the previous section were also assumed in this study, with only 20 sensors deployed in the cluster. The performance of the QE algorithm was evaluated at the CH for a variable number of reported signal power measurements ranging from 4 to 20. The sensors’ reported signal power measurements were selected based on the strength of the measurements. In the simulation, we initially started with four reported power measurements because the QE location estimator needs at least four measurements to obtain unique positions.¹⁶ Figure 15 illustrates the results of target localization using the QE algorithm for a reduced number of sensors’ measurements.

Figure 15.

Impacts of different number of selected signal power measurements on localization accuracy.

Figure 15 also shows that the localization algorithm was capable of performing with almost the same precision starting from six reported signal power measurements and above. Therefore, we can conclude that the optimal number of sensor power measurements needed to perform an accurate localization at the CH is six signal power measurements. This reduction in the number of signal power measurements significantly reduces the processing overhead at the CH, which consequently reduces its energy consumption. Furthermore, using only the highest received signal power measurements increases the accuracy of target localization since the CH selects the measurements reported by the node closest to the target.

Energy efficiency of the proposed scheme for sensor-based implementation

The capability of the proposed scheme in terms of energy efficiency and processing time was studied for the following three tasks: recognition, localization, and notification. We focused on measuring the number of clock cycles, reflecting the processing time, and the energy consumption of sensors in various scenarios. The AVRORA simulator³⁸ was adopted to evaluate these metrics. It is an instruction-level simulator emulating several sensor types, such as MICA, TelosB, and many other platforms. We estimated these metrics for MICAz motes, which have an ATmega128L microcontroller and 4 kB of RAM memory.

Object recognition

The energy efficiency and processing time for the tasks of new object detection, feature extraction, and classification were evaluated in each member sensor of the cluster. In this evaluation, the sensors were assumed to record sample signals at a frequency of 44,100 Hz. The simulation results for processing 1 s (44,100 samples) of the recorded sound are illustrated in Table 6.

Table 6.

Evaluation of the recognition cost on MICAz.

Measured attribute	Clock cycles	Time (ms)	Energy (mJ)
New object detection	352	0.044	0.0009
Feature extraction	1527	0.19	0.004
Object classification	6814	0.85	0.01
Whole scheme	8693	1.084	0.0149

The information presented in this table shows that time and energy are highly consumed during the classification phase compared to other tasks. Object classification requires the processing of several mathematical operators using expensive computations that are performed while calculating the Euclidean distance between the features’ vectors. However, this step is fundamental in the strategy of smart edge sensing, since it detects events of interest and avoids streaming the whole acoustic signal. The reduction in the activity of the sensors’ wireless transceivers would reduce traffic in the network, consequently increasing the whole network lifetime.

The results presented in Table 6 prove that the proposed feature extraction and classification methods are suitable for implementation in resource-constrained sensor nodes. In fact, the proposed recognition scheme has less complexity and better recognition accuracy than other works when used to classify the target animal into multi-label classes. We believe that the small dataset used for testing the recognition performance has a major impact on the classification accuracy. Moreover, the significant overlap between the feature values for different animal classes is another important factor that influenced the recognition performance. Nevertheless, in spite of these limitations, the recognition scheme provided a good trade-off between cost and performance, leading to a prolonged system lifetime.

Object localization

This section evaluates the energy consumption of the CH node while performing the target localization (Table 7). In the first simulation, we studied the cost of the target location estimation in the CH for two different cases: using only six measurements or using all the reported measurements of sensors in the cluster.

Table 7.

Evaluation of the localization cost on MICAz.

Measured attribute	Clock cycles	Time (ms)	Energy (mJ)
Target localization (6 measurements)	221,609	0.027	0.62
Target localization (20 measurements)	1,493,941	0.186	4.24

Table 7 illustrates the time and energy consumption of the localization task for the two scenarios. As expected, minimizing the number of processed signal power measurements in the localization task can significantly decrease the residual energy consumption and processing time at the CH. In fact, the results indicate that the CH node can gain approximately 85.38% more energy when applying the selected signal power measurements approach, thus extending the CH node lifetime. In addition, it reduces the overall processing time and, hence, reduces the data transmission delay, which is critical for real-time applications, such as tracking tasks. Consequently, we can conclude that the adopted scheme with the reduced number of signal measurements minimizes the overall acoustic sensing cost and prolongs the lifetime of the acoustic-based application in the WSN.

Target notification

The processing time and energy consumption for the notification task are evaluated in this section. Table 8 provides the results of the notification cost. We considered several types of notifications that might correspond to different application requirements:

Notification of the target detection.

Notification of detection with the transmission of vector of features.

Notification of detection with the transmission of vector of features and target location.

Transmission of the whole acoustic signal.

Table 8.

Evaluation of the notification cost to the remote server using MICAz.

Measured attribute	Time (ms)	Energy (mJ)
Transmit detection notification (1 byte)	0.01	0.33
Transmit 2D feature vector (2 bytes per feature)	0.04	1.32
Transmit target location (2 bytes per coordinate) and 2D feature vector (2 bytes per feature)	0.08	2.64
Transmit raw signal (2 bytes per sample)	0.02	0.66

The sensor platform used for energy efficiency was MICAz. This sensor offers a bit rate of 250 kbps, which could be used to stream the received acoustic signal to the remote server. The signal has a sampling rate of 44,100 samples per second. Therefore, the energy consumed by a sensor during streaming of the recorded acoustic signal, denoted by $E_{signal}$ , can be modeled as

E_{signal} = (S_{r} \times L \times E_{t}) \times T_{s}

(22)

where $S_{r}$ is the sampling rate of the signal; $L$ is the length of one sample; $E_{t}$ is the energy of transmitting one bit; and $T_{s}$ represents the sampling period of the recorded sound. In this equation, we have assumed that L = 16 bits, $E_{t}$ = 0.08 mJ, and $T_{s}$ = 3 s.

Using equation (22), the energy to transmit the full signal would be $E_{signal}$ = 28,224 mJ. However, using the proposed sensing scheme, if we consider that the sensor performs the recognition, localization, and notification, including the target location and the vector feature of the detected object, then the consumed energy in the sensing scheme, denoted by $E_{sensing}$ , is expressed as

E_{sensing} = E_{C} + E_{L} + E_{N}

(23)

where $E_{C}$ is the energy required during the classification phase; $E_{L}$ is the energy required during the localization phase; and $E_{N}$ represents the energy consumed during the notification transmission. Given the results discussed, $E_{sensing}$ = 3.274 mJ. Thus, using our scheme, the gain in energy, denoted by $G$ , is given by

G = (1 - \frac{E_{sensing}}{E_{signal}}) \times 100 = 99.9 %

(24)

Therefore, the gain in energy when using the proposed scheme is G = 99.9%, attesting to its energy efficiency when deployed for animal target monitoring.

Conclusion

In this article, we have proposed an efficient scheme for low-energy acoustic sensing in WASN. This scheme was designed to be implemented in a cluster-based architecture where tasks are processed over the sensors of the cluster and the CH. The presented sensing approach is intended to recognize a specific target using lightweight time domain features signatures extracted from the received acoustic signal. It was also designed to locate the target using an optimized, energy-based localization technique.

We studied the performances for low energy target recognition, localization, and notification. The results have shown that the adopted features’ extraction methods were able to generate unique signatures, which were successfully used to discriminate between different acoustic objects in the recognition process. It was also shown that the energy-based localization method was able to locate a target with an acceptable accuracy. Furthermore, the results have shown that the sensing scheme implemented in a cluster-based architecture was able to perform the required tasks with low energy consumption, which contributes to extending the network lifetime.

The approach and results presented in this article provide a foundation for many research works in acoustic sensing with WASNs. Further research should be conducted to extend the capability of the proposed scheme to more applications in acoustic monitoring, such as recognizing a target in a multi-object environment, localizing more than a single target within the sensor field, and tracking of single or multiple targets simultaneously. These contributions would improve upon and spur the development of acoustic monitoring applications. The implementation of the proposed scheme in a real sensor-based platform would also help to explore the actual performance of this sensing approach.

Footnotes

Acknowledgements

The authors extend their appreciation to the Deanship of Scientific Research at King Saud University for funding this work through research group number (RG-1439-023).

Handling Editor: Antonio Lazaro

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research work was funded by Deanship of Scientific Research at King Saud University through research group number (RG-1439-023).

ORCID iD

Afnan Algobail

References

Noda

Travieso

Sánchez-Rodríguez

. Fusion of linear and mel frequency cepstral coefficients for automatic classification of reptiles. Appl Sci 2017; 7(2): 178.

Xie

Towsey

Zhu

, et al. An intelligent system for estimating frog community calling activity and species richness. Ecol Indicat 2017; 82: 13–22.

Colonna

Peet

Abreu Ferreira

, et al. Automatic classification of anuran sounds using convolutional neural networks. In: Proceedings of the 9th international conference on computer science software engineering, Porto, 20–22 July 2016, pp.73–78. New York: ACM.

Noda

Travieso

Sánchez-Rodríguez

. Methodology for automatic bioacoustic classification of anurans based on feature fusion. Exp Syst Appl 2016; 50: 100–106.

Xie

Towsey

Zhang

, et al. Acoustic classification of Australian frogs based on enhanced features and machine learning algorithms. Appl Acoust 2016; 113: 193–201.

Xie

Towsey

Zhang

, et al. Feature extraction based on bandpass filtering for frog call classification. In: Proceedings of the international conference on image and signal processing, Trois-Rivières, QC, Canada, 30 May–1 June 2016, pp.231–239. New York: Springer.

Colonna

Nakamura

Rosso

. Feature evaluation for unsupervised bioacoustic signal segmentation of anuran calls. Exp Syst Appl 2018; 106: 107–120.

Luque

Romero-Lemos

Carrasco

, et al. Non-sequential automatic classification of anuran sounds for the estimation of climate-change indicators. Exp Syst Appl 2018; 95: 248–260.

Gabriel Colonna

Ribas

dos Santos

, et al. Feature subset selection for automatically classifying anuran calls using sensor networks. In: Proceedings of the international joint conference on neural networks (IJCNN), Brisbane, QLD, Australia, 10–15 June 2012, pp.1–8. New York: IEEE.

10.

Lombard

Zheng

Buchner

, et al. TDOA estimation for multiple sound sources in noisy and reverberant environments using broadband independent component analysis. IEEE Trans Audio Speech Lang Process 2011; 19(6): 1490–1503.

11.

Alexandridis

Mouchtaris

Multiple sound source location estimation and counting in a wireless acoustic sensor network. In: Proceedings of the Workshop on applications of signal processing to audio and acoustics (WASPAA), New Paltz, NY, 18–21 October 2015, pp.1–5. New York: IEEE.

12.

Steen

McClellan

Green

, et al. Acoustic source tracking in long baseline microphone arrays. Appl Acoust 2015; 87: 38–45.

13.

Meng

Xiao

. Energy-based acoustic source localization methods: a survey. Sensors 2017; 17(2): 376.

14.

Cobos

Antonacci

Alexandridis

, et al. A survey of sound source localization methods in wireless acoustic sensor networks. Wirel Commun Mob Comput 2017; 2017: 1–24.

15.

Cheng

Zhang

, et al. A survey of localization in wireless sensor network. Int J Distribut Sens Netw 2012; 8(12): 962–523.

16.

Energy-based collaborative source localization using acoustic microsensor array. In: Proceedings of the 2002 IEEE workshop on multimedia signal processing, Saint Thomas, VI, 9–11 December 2002. New York: IEEE.

17.

Meesookho

Mitra

Narayanan

. On energy-based acoustic source localization for sensor networks. IEEE Trans Sig Process 2008; 56(1): 365–377.

18.

Blatt

Hero

. Energy-based sensor network source localization via projection onto convex sets. IEEE Trans Sig Process 2006; 54(9): 3614–3619.

19.

Wang

. New semidefinite relaxation method for acoustic energy-based source localization. IEEE Sens J 2013; 13(5): 1514–1521.

20.

Beko

. Energy-based localization in wireless sensor networks using second-order cone programming relaxation. Wirel Pers Commun 2014; 77(3): 1847–1857.

21.

Hen Hu

. Least square solutions of energy based acoustic source localization problems. In: Proceedings of the IEEE international conference on parallel processing workshops (ICPP), Montreal, QC, Canada, 18 August 2004, pp.443–446. New York: IEEE.

22.

Sheng

. Maximum likelihood multiple-source localization using acoustic energy measurements with wireless sensor networks. IEEE Trans Sig Process 2005; 53(1): 44–53.

23.

Algobail

Soudani

Alahmadi

. Energy-aware scheme for animal recognition in wireless acoustic sensor networks. In: Proceedings of the 7th international conference on sensor networks (SENSORNETS 2018), Funchal, 22–24 January 2018, pp.31–38. New York: ACM.

24.

Bertrand

Applications and trends in wireless acoustic sensor networks: a signal processing perspective. In: Proceedings of the symposium on communications and vehicular technology in the Benelux (SCVT), Ghent, 22–23 November 2011, pp.1–6. New York: IEEE.

25.

Salomons

Havinga

. A survey on the feasibility of sound classification on wireless sensor nodes. Sensors 2015; 15(4): 7462–7498.

26.

Scheirer

Slaney

. Construction and evaluation of a robust multifeature speech/music discriminator. In: Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP), Munich, 21–24 April 1997, pp.1331–1334. New York: IEEE.

27.

Wold

Blum

Keislar

, et al. Content-based classification, search, and retrieval of audio. IEEE Multimed 1996; 3(3): 27–36.

28.

Shah

Mehta

. Classification of vehicles using adaptive neuro fuzzy inference system. In: Proceedings of the students’ conference on electrical, electronics and computer science, Bhopal, India, 1–2 March 2014. New York: IEEE.

29.

Astapov

Riid

A Multistage procedure of mobile vehicle acoustic identification for single-sensor embedded device. Int J Electron Telecommun 2013; 59(2): 151–160.

30.

Rabiner Heinzelman

Chandrakasan

Balakrishnan

. Energy-efficient communication protocol for wireless microsensor networks. In: Proceedings of the 33rd annual Hawaii international conference on system sciences, Maui, HI, 7 January 2000. New York: IEEE.

31.

Ghiurcau

Rusu

Bilcu

, et al. Audio based solutions for detecting intruders in wild areas. Sig Process 2012; 92(3): 829–840.

32.

Chu

Narayanan

Kuo

Environmental sound recognition with time–frequency audio features. IEEE Trans Audio Speech Lang Process 2009; 17(6): 1142–1158.

33.

Lerch

An introduction to audio content analysis: applications in signal processing and music informatics. 1st ed. Hoboken, NJ: John Wiley & Sons, 2012.

34.

Rudrapatna

Sowmya

Feature weighted minimum distance classifier with multi-class confidence estimation. In: Proceedings of the Australasian joint conference on artificial intelligence, Hobart, TAS, Australia, 4–8 December 2006, pp.253-263. New York: IEEE.

35.

Luque

Larios

Personal

, et al. Evaluation of MPEG-7-based audio descriptors for animal voice recognition over wireless acoustic sensor networks. Sensors 2016; 16(5): 717.

36.

Huang

Benesty

Chen

Acoustic MIMO signal processing. Berlin: Springer, 2006.

37.

Jin

Electromagnetic scattering modelling for quantitative remote sensing. World Sci 1994. https://www.worldscientific.com/worldscibooks/10.1142/2253

38.

Titzer

Lee

Palsberg

Avrora: scalable sensor network simulation with precise timing. In: Proceedings of the international symposium on information processing in sensor networks (IPSN), Los Angeles, CA, 24–27 April 2005. New York: ACM.

39.

HD—Animals Birds Sound Effects [Internet]. Sound-ideas.com , 2017. https://www.sound-ideas.com/Product/380/HD-Animals-Birds-Sound-Effects (accessed 15 July 2017).

Energy-efficient scheme for target recognition and localization in wireless acoustic sensor networks

Abstract

Keywords

Introduction

Related work

General approach for target recognition and localization

Tasks’ specification of the proposed sensing scheme

Sampling period

Object detection

Signal framing

Features’ extraction

Object classification

Object notification

Object localization

Implementation and performance analysis

Performance analysis at the application level

Object recognition

Reference vector extraction and recognition accuracy

Object localization

Optimizing the number of acoustic signal power measurements

Energy efficiency of the proposed scheme for sensor-based implementation

Object recognition

Object localization

Target notification

Conclusion

Footnotes

Acknowledgements

Declaration of conflicting interests

Funding

ORCID iD

References