Deep combining of local phase quantization and histogram of oriented gradients for indoor positioning based on smartphone camera

Abstract

To achieve high accuracy in indoor positioning using a smartphone, there are two limitations: (1) limited computational and memory resources of the smartphone and (2) the human walking in large buildings. To address these issues, we propose a new feature descriptor by deeply combining histogram of oriented gradients and local phase quantization. This feature is a local phase quantization of a salient histogram of oriented gradient visualizing image, which is robust in indoor scenarios. Moreover, we introduce a base station–based indoor positioning system for assisting to reduce the image matching at runtime. The experimental results show that accurate and efficient indoor location positioning is achieved.

Keywords

Indoor positioning smartphone salient region detection deep combining of histogram of oriented gradients and local phase quantization histogram of oriented gradient visualization

Introduction

Indoor positioning is considered an enabler for a variety of applications, such as guidance of passengers on airports, conference attendees, visitors in shopping malls, and for many novel context-aware services, which can play a significant role for monetarization. The demand for an indoor positioning service or indoor location-based services (iLBS) has also accelerated given that people spend the majority of their time indoors.¹ Over the last decade, researchers have studied many indoor positioning techniques.² In addition, with the development of the integrated circuit technology, multi-sensors, for example, camera, Earths magnetic field, WiFi, Bluetooth, inertial module, have been integrated in smartphones. Therefore, smartphones are becoming powerful platforms for location awareness.

The traditionally used outdoor localization method, Global Navigation Satellite System (GNSS), is not available in indoor environments, even though navigation tasks on street level are very precise. A catalog of alternative localization techniques has been investigated, such as infrared-,³ sensor-,^3,4 wireless-,^5,6 communication basestation–based technologies,⁷ pseudolite⁸ or visual markers.⁹ However, most of these technologies, relying on wireless technology, face issues in the presence of radio frequency interference (RFI) and interference of non-line of sight (NLOS) caused by dense forests, urban canyons, and terrain.¹ Moreover, some of these technologies work in a limited area such as inertial sensor–based approaches or some need a particular environmental infrastructure and augmentation such as Locata, that is, a pseudolite positioning system.⁸ Therefore, smartphone camera–based indoor positioning is a promising approach for accurate indoor positioning without the need for expensive infrastructure such as access points or beacons.

The key method of camera-based localization is image matching. Images taken by a smartphone camera are matched to previously acquired reference images with known position and orientation. The matching of smartphone recordings with a database of geo-referenced images allows for meter accurate infrastructure-free localization.¹⁰ According to the matched reference image, the location of the smartphone is calculated. In mobile indoor scenarios that are shown by Figure 2, users usually walk during positioning and navigation procedure. Therefore, the captured images by smartphone cameras are scaled, rotated, and even blurred because of hands shaking. Moreover, most of the researchers recently focus on invariant feature extraction. Ravi et al.¹¹ extracted color histograms, wavelet decomposition, and image shape for image matching to locate a user’s position. Kim and Jun¹² proposed a method based on image color histogram feature for positioning using augmented reality tool. However, the positioning accuracy of those two methods would work inefficiently in the varying light and crowded scenarios. In order to extract the invariant features, SIFT and its improved algorithms are widely used for image-based indoor localization. Kawaji et al. used principal component analysis-scale invariant feature transform (PCA-SIFT) feature for railway museum indoor positioning. Werner et al.¹³ proposed a camera-based indoor positioning using speeded up robust features (SURF) feature for speeding up the image matching. Li and Wang¹⁴ introduced affine-scale invariant feature transform (A-SIFT) feature for image matching achieved by random sample consensus (RANSAC), which increased the matching accuracy. Heikkilä et al.¹⁵ proposed a similar method¹⁴ for indoor positioning.

However, those two complex computational methods are not suitable for smartphone-based indoor positioning. This is because the limited computational resources of mobile devices¹⁶ extracted the edge-based features from the visual tag image, and those features are fused with inertial information for indoor navigation. Kim and Jun¹² used the Sobel filter integrating mean structural similarity index for estimating the arrival of angle and height during the indoor localization. However, these two methods need additional visual marks for assisting smartphone camera for detecting features, which increases the indoor positioning cost. Meanwhile, all of these research works mainly focus on improving image-matching accuracy. Some of these algorithms are, however, quite demanding in terms of their computational complexity and therefore not suited to run on mobile devices, which need smartphones with high hardware configuration. Although smartphones are inexpensive, they have even more limited performance than Tablet and PCs. Phones are embedded systems with severe limitations in both the computational facilities and memory bandwidth. Therefore, natural feature extraction and matching on phones have largely been considered prohibitive and have not been successfully demonstrated to date.¹⁷ To address these issues, Van Opdenbosch et al.¹⁰ used the improved vector of locally aggregated descriptors’ (VLAD) image signature and emerging binary feature descriptor binary robust independent elementary features (BRIEF) to achieve the smartphone camera-based indoor positioning. Besides, in order to reduce the overall computational complexity, they proposed a scalable streaming approach for loading the reference images to the phones. Different with their method, this article proposed an efficient feature descriptor named Turbo Fusing Histogram of oriented gradients (HOG) and Local phase quantization (LPQ) Salient feature (TFHLS). The TFHLS features are extracted from the partial image which are salient image regions, and they are invariant to the illumination, scale, rotation, and blur caused by camera shaking. Moreover, a wireless-based indoor positioning system time&code division-orthogonal frequency division multiplexing (TC-OFDM) is introduced to calculate the coarse positions for supporting the floor number to the smartphone, which would reduce the number of images which are downloaded to the smartphones. Using this approach, our camera-based indoor positioning algorithm results in the reduction in computational complexity, hardware requirement, and network latency.

This article is organized as follows to achieve our investigations. First of all, we discuss the related work on HOG and LPQ feature extraction in section “Related work.” Then, we introduce our image feature extraction based on fusing HOG and LPQ in section “Proposed smartphone camera-based indoor positioning.” After that, we test the proposed algorithm on the Technische Universität München (TUM) indoor dataset¹⁸ and the Beijing University of Posts and Telecommunications (BUPT) indoor dataset collected by our lab, and the evolution of our algorithm is also shown in this section. Finally, in Section “Conclusion,” we conclude the article and provide a future work on possible extensions.

Related work

Finding efficient and discriminative descriptors is crucial for indoor complex scenarios. HOG descriptor was proposed by Dalal and Triggs¹⁹ for human detection. The main idea behind HOG is based on the local edge information.¹⁵ Because of its efficient performance, HOG feature are widely used in human detection,^20,21 face recognition,^22,23 and image searching.²⁴ All of these applications show that HOG feature is invariant to the illumination. According to our experiment, HOG feature is not robust when the humans are crowded and the images are blurred. Wang et al.²⁵ combined the HOG and local binary pattern (LBP) features for human detection. However, they concluded that their detector cannot handle the articulated deformation of people. Our visualizations reveal that the world that features see is slightly different from the world that the human eye perceives.

Recently, LPQ is insensitive to image blurring, and it has proven to be a very efficient descriptor in face recognition from blurred and sharp images.^15,26 LPQ was originally designed by Ojansivu and Heikkila similar to the LBP methodology as a texture descriptor.²⁷ In our opinion, robust and efficient image matching requires several different kinds of appearance information to be taken into account, suggesting the use of heterogeneous feature sets. In our proposed algorithm, the HOG features are extracted from the salient regions, and LPQ features are extracted from the HOG visualizing image. Therefore, the HOG and LPQ are integrated for building an efficient feature, that is, TFHLS for indoor image matching.

Proposed smartphone camera-based indoor positioning

The smartphone camera-based indoor positioning procedure using TFHLS feature is shown in Figure 1.

Figure 1.

Flowchart of smartphone camera-based indoor positioning.

Study materials

In order to test and evaluate the proposed algorithm, two databases are used. The first one is supported by TUM.²⁸ In TUM dataset, there are 54,896 reference views, which covers 3431 positions with 1-m accuracy. Another dataset is collected by our lab which captured 1000 indoor images using smartphone cameras in BUPT campus. Different with TUM dataset in calculating the reference positions, a static measurement system based on TC-OFDM and BeiDou real-time kinematic is introduced. The scalable locations with positioning accuracy 0.1–1 m are obtained. The BUPT dataset covers four buildings and results in a total of 2189 positions.

Superpixel-based, sparsifying, high-resolution image

Inspired by the human vision system (HVS), the features extracted from salient regions are invariant to viewpoint change, insensitivity to image perturbations and repeatability under intra-class variation.²⁹ These features are extracted from some regions of the image, but not the whole image. This procedure is called sparsifying image in this article. Therefore, the salient region is introduced for the image matching. In this article, a superpixel-based approach, simple linear iterative clustering (SLIC), proposed by Achanta et al.³⁰ is used to pre-segment an image. SLIC method generates superpixels by clustering pixels based on their combined five-dimensional similarity and proximity in the image plane which is shown by the following functions

d_{lab} = \sqrt{{(l_{k} - l_{i})}^{2} + {(a_{k} - a_{i})}^{2} + {(b_{k} - b_{i})}^{2}}

(1)

d_{xy} = \sqrt{{(x_{k} - x_{i})}^{2} + {(y_{k} - y_{i})}^{2}}

(2)

D_{s} = d_{lab} + \frac{m}{S} d_{xy}

(3)

where $D_{s}$ is the sum of the $d_{lab}$ distance and the $d_{xy}$ plane is normalized by the grid interval S. A variable m is introduced in $D_{s}$ allowing us to control the compactness of a superpixel. Equation (1) is used to calculate the distance between two different pixels in the lab color space. Equation (2) is used to obtain the Euclidean distance between two different pixels. Equation (3) is used to transform different dimensional distances into the same dimensional distance. Based on equation (3), the size of each superpixel can be varied with $D_{s}$ , which makes our proposed segmentation approach robust and accurate. In the SLIC method, the desired number of superpixels should be specified, which increases the computation complexity and is unsuitable for segmenting image sequences. To detect salient regions from superpixel image and not the pixel-level image using equation (4)

ℜ (s_{i}) = α \times C (s_{i}) + β \times T (s_{i})

(4)

where ℜ is the candidate salient region, $s_{i}$ is the superpixel, C is the contrast, T is the superpixel entropy, and $α + β = 1$ . The threshold C used for detecting salient superpixels is calculated using equation (5)

C (s_{i}) = \frac{μ_{s_{i}}}{μ_{f}}

(5)

where $μ_{s_{i}}$ is the mean of the i superpixel, and $μ_{f}$ is the mean of an image. Then,salient superpixel regions are detected. Moreover, in order to extract the HOG features, each salient superpixel regions are extended into relational rectangles which are named salient rectangles. The sizes of those rectangles are calculated using equation (6)

{\begin{matrix} R_{a} & = | x_{\max} - x_{\min} | \\ R_{b} & = | y_{\max} - y_{\min} | \end{matrix}

(6)

where $x_{\max}$ is the position of the far right pixel in the horizontal direction, and $x_{\min}$ is the position of the far left pixel in the horizontal direction. $y_{\max}$ is the position of the topside pixel in the vertical direction, and $y_{\min}$ is the position of the downside pixel in the vertical direction. The center of the salient rectangle is the centroid of the related superpixel.

TFHLS feature extraction approach

HOG feature extraction

HOG descriptors are invariant to two-dimensional (2D) rotation which has been used in many different problems in computer vision, such as pedestrian detection. Compared to the original HOG, the integrated HOG feature proposed by Zhu et al.²¹ without trilinear interpolation is easier and faster to be computed. However, the HOG’s performance would be worse than the original HOG. Therefore, we introduced a constrained trilinear interpolation approach to replace the general trilinear interpolation. Moreover, it should be noted that both Wang et al.²⁵ and Li et al.³¹ proposed a $7 \times 7$ kernel that is shown in equation (7) to convolve the gradients for calculating the gradient orientation at each pixel. However, it is a heavy computation procedure to convolve using the $7 \times 7$ kernel. A novelty $5 \times 5$ convolution kernel is designed to be implemented. For an 8-bit image, the kernel template is shown in equation (8)

Con v_{77} = \frac{1}{256} [\begin{matrix} 1 & 2 & 3 & 4 & 3 & 2 & 1 \\ 2 & 4 & 6 & 8 & 6 & 4 & 2 \\ 3 & 6 & 9 & 12 & 9 & 6 & 3 \\ 4 & 8 & 12 & 16 & 12 & 8 & 4 \\ 3 & 6 & 9 & 12 & 9 & 6 & 3 \\ 2 & 4 & 6 & 8 & 6 & 4 & 2 \\ 1 & 2 & 3 & 4 & 3 & 2 & 1 \end{matrix}]

(7)

Con v_{HOG} = \frac{1}{256} [\begin{matrix} 1 & 3 & 4 & 3 & 1 \\ 3 & 6 & 8 & 6 & 3 \\ 5 & 12 & 16 & 12 & 5 \\ 3 & 6 & 8 & 6 & 3 \\ 1 & 3 & 4 & 3 & 1 \end{matrix}]

(8)

Moreover, in order to reduce the space complexity of the integral image method, the kernel in equation (8) is convoluted with the salient rectangle but not the whole original image, which decreased the computational complexity.

HOG feature visualization

In this article, we introduced a HOG visualizing method proposed by Vondrick et al.³² Different with their complex method, a more simple method based on equation (9) is proposed

ϕ^{- 1} (y) = \underset{x \in R^{D}}{argmin} ∥ ϕ (x) - y ∥^{2}

(9)

where $x \in R^{D}$ is a salient rectangle sub-image and $y = ϕ (x)$ is the corresponding HOG feature descriptor. In this article, HOG feature visualization is posed to be a feature inversion procedure. In order to optimize equation (9), we used gradient-descent strategies by numerically evaluating the derivative in image space with least-squares method.

LPQ feature extraction from HOG visualization image

After inverting HOG features into an image $Y_{H} OG$ , LPQ features are extracted from $Y_{H} OG$ using a simple scalar quantizer equation (10). LPQ feature is based on quantifying the Fourier transform phase by considering the sign of each component in Fourier coefficients $G (x)$ . Different with LBP, LPQ features are calculated in an image frequency transformed by fast Fourier transform (FFT). However, the LPQ feature is extracted in a local region of FFT domain, which is similar to LBP. According to Dhall et al.,³³ the local Fourier coefficients of each pixel are computed around its four frequency points. After that, in order to obtain the phase information of each pixel in superpixel area, a binary scalar quantizer is implemented for quantifying the signs of the real and imaginary part of each coefficient. Finally, the quantization result of each coefficient is coded in an 8-bit binary string

q_{i} (x) = {\begin{matrix} \begin{matrix} 1 & if gi (x) \leq 0 \end{matrix} \\ \begin{matrix} 0 & otherwise \end{matrix} \end{matrix}

(10)

where $g_{i} (x)$ is the ith component of $G (x)$ . Then, the phase information of the 8-bit HOG visualizing image is described using equation (10)

f_{LPQ} (x) = \sum_{n = 1}^{8} q_{n} 2^{n - 1}

(11)

The final LPQ features are used as feature vectors to represent an indoor sub-image.

TFHLS feature matching

The main advantage of the binarization, apart from a reduced memory footprint, is a very fast matching process using the normalized Hamming distance by equation (12)

d = \frac{\sum_{i = 1}^{N} \sum_{j = 1}^{N} P_{M} (i, j) \cap Q_{M} (i, j) \cap (P_{R} (i, j) \otimes Q_{R} (i, j)) + P_{M} (i, j) \cap Q_{M} (i, j) \cap (P_{I} (i, j) \otimes Q_{I} (i, j))}{2 \sum_{i = 1}^{N} \sum_{j = 1}^{N} P_{M} (i, j) \cap Q_{M} (i, j)}

(12)

where $P_{R} (Q_{R})$ , $P_{I} (Q_{I})$ , and $P_{M} (Q_{M})$ are the real part, the imaginary part, and the mask of $P (Q)$ , respectively. The result of the Boolean operator (⊗) is equal to zero if and only if there are two bits. The symbol ∩ represents the AND operator, and the size of the feature matrixes is $N \times N$ .

Experimental results

Query dataset and setup description

We recorded a query set of 128 images captured by an iPhone 6 with manually annotated position information. The images are approximately 5 megapixels in size and are taken using the default settings of the iPhone 6 camera application. Furthermore, the images consist of landscape photos either taken head-on in front of a building or at a slanted angle of approximately $30^{\circ}$ . After obtaining the images, next, we run the remaining query images with successfully retrieved database images through the pose estimation part of the pipeline. In order to characterize pose estimation accuracy, we first manually ground truth for the position and pose of each query image taken. This is done using the computer-aided design (CAD) map of the buildings in BUPT and distance measurements recorded during the query dataset collection. For a detailed evaluation, the query set has been split into classes that is the same with the TUM database: high texture, low texture, hallways, ambiguous objects, and building structure, where each query can be assigned to more than one class (Figures 2 and 3). Meanwhile, the framework of our smartphone camera-based indoor positioning system is shown in Figure 4. It should be known that we ignore the orientation information calculation.

Figure 2.

Exemplary queries for all classes from TUM: (a) low textures, (b) high textures, (c) blurred image, (d) building hall, (e) hallway, and (f) Illumination change.

Figure 3.

Exemplary queries for all classes from BUPT: (a) low textures, (b) high textures, (c) blurred image, (d) building hall, (e) hallway, and (f) illumination change.

Figure 4.

The module of navigation and positioning system.

Our method was implemented using MATLAB 2015a, and this method was programmed by integrating C# and MATLAB code. The hardware configuration of our experimental platform where our method ran is shown in Table 1.

Table 1.

Hardware configuration

Parameter	Value
CPU	Intel
CPU Processor	Core i7 $Core \times 2.5 GHz$
OS	Windows 10
Memory	8 GB DDR4 Dual Channel
SSD	256 GB
Camera	13 MP

It is noted that the camera-based positioning method proposed by Ravi et al.¹¹ is used to compare with our proposed method, and both the test data and the MATLAB code of that method are supported by Opdenbosch.

Evaluation of high-resolution image sparsifying

Figure 5 shows a qualitative result for the image sparsifying by detecting salient regions based on superpixels. From the second row of Figure 5 obtained by the proposed HVS-based approach for a variety of images from the TUM and BUPT database. Preserve the salient regions in each image while remaining compact and uniform in size of objects. Moreover, the salient superpixels that are detected include sparse features, which can be achieved to reduce the computation of indoor positioning.

Figure 5.

Exemplary queries for salient region detection and HOG feature visualization: (a) Indoor of Our Lab, (b) Hall of Our Research Building, (c) Corridor of a Building in TMU, (d) Corridor of Our Research Building, (e) Salient Map of Figure 5(a), (f) Salient Map of Figure 5(b), (g) Salient Map of Figure 5(c), (h) Salient Map of Figure 5(d), (i) HOG Feature Visualization of Figure 5(e), (j) HOG Feature Visualization of Figure 5(f), (k) HOG Feature Visualization of Figure 5(g), and (l) HOG Feature Visualization of Figure 5(h).

According to Figure 5, we can find that salient regions are detected even when the image is blurred, which is shown by three images in the second column of Figure 5.

According to our statistics, the number of TFHLS features in Figure 6(b) is 69% less than that in Figure 6(a). It is noted that features in Figure 6(b) are extracted from the salient regions of an image, which shows that our salient region detection approach is efficient and powerful. Therefore, less features are used for image matching, which speeds up the process of the image matching and remains high matching ration according to Table 2.

Figure 6.

TFHLS features matching for BUPT images: (a) TFHLS feature extraction with high density and (b) TFHLS features sparsing.

Table 2.

Matching result of different image features.

Setup	Matching rate (%)	Running time (ms)
TFHLS detector	93	13.2
TFHLS detector	93	13.2
FAST detector	68	0.98
LBP detector	70	6.37
BRIEF descriptor	73	4.77
SURF detector	82	232.6
BVLAD matching(LSH)	85	53.74
BVLAD matching (LS)	87	100.17

Qualitative evaluation of HOG visualization

The third row of Figure 5 shows the HOG feature visualization results under different indoor scenarios. These result visualizations allow us to analyze object from the view of HOG detector, which is a new approach and gain new insight into the detectors failures, which is different with the human salient vision. From the first and third row of Figure 5, the high-frequency details in original images have high contrast in HOG visualization images. Paired dictionary learning tends to produce the best visualization for HOG descriptors. Although HOG does not explicitly describe the color, we found that the paired dictionary is able to recover color from HOG descriptors. Therefore, by visualizing feature spaces, we can obtain a more intuitive understanding of recognition systems.

Evaluation of TFHLS feature extraction and matching

In order to identify optimal parameters for the approach described above, several experiments are conducted with varying settings. Figure 7 summarizes the performance of comparing the TFHLS feature matching to the method proposed by Van Opdenbosch et al.¹⁰ A smartphone running Andriod OS 4.4 was used to implementing the positioning methods which were used in this article.

Figure 7.

TFHLS features matching for BUPT images: (a) TFHLS features matching for high-texture image, (b) TFHLS features matching for blurred image, (c) TFHLS features matching for low-texture image, and (d) TFHLS features matching for indoor image.

Qualitative results

Figure 7 shows the TFHLS features matching results in four different scenarios. As shown in Figure 7(a), successful retrieval usually involves matching of object textures in both query and database images. According to Figure 7(b), we can find that our proposed TFHLS feature is efficient to match the blurred images.

Quantitative results

Table 2 shows that we successfully match 113 of 128 images to achieve a retrieval rate of 93%, where LS means linear search and LSH means locality sensitive hashing. Moreover, as shown in Table 2, the proposed method achieves to match the images of TUM database with the highest success in 13.2 ms for each image. Figure 8 shows the performance comparison in miss rate between our proposed method and other two LBP-based methods.

Figure 8.

The performance comparison between the proposed human detectors and the state-of-the-art detectors on BUPT database.

Positioning result evaluation

Figure 9 summarizes the performance of the location information estimation and the comparison result. From Figure 9(a) and (b), we can localize the position to within sub-meter level of accuracy for over 56% of the query images. Furthermore, 85% of the query images are successfully localized to within 2 m of the ground’s truth position. As seen in Figure 7(a), when the location error is less than 1 m, the TFHLS features of the corresponding corridor signs present in both query and database images matched together. Moreover, we find that the TFHLS detector extracted more features,¹⁰ even though the images are blurred, which is shown in Figure 9(b). As shown in Figure 10(a) and (b), we plot the estimated and ground-truth locations in the horizontal and vertical directions. Besides, Figure 10(c) shows the comparing locations of the query images onto the New Research Buildings 2D floor plan. As seen from Figure 10, there is a close agreement between the ground truth and TFHLS-based results. The root mean square error (RMSE) between the estimated and the ground-truth positioning results is 1.253 m.

Figure 9.

The module of smartphone camera-based indoor positioning: (a) positioning result based on TUM dataset and (b) positioning result based on BUPT dataset.

Figure 10.

The location comparison result: (a) positioning result in horizontal direction, (b) positioning result in vertical direction, and (c) locations on the 2D floor plan.

Figure 11 shows the indoor positioning comparison performance in RMSE. From this figure, we can find that the proposed approach can achieve high-accuracy indoor locations than VLAD- and TC-OFDM-based methods. Most of the VLAD and TC-OFDM indoor positioning results are more than 3 m, while the positioning results based on our method is less than 1.5 m. Moreover, the proposed method is robust because its RMSE curve is smooth, which shows that our method can get stable results. The performance gap between the ground truth and estimations in both Figures 9 and 11 suggests that the TFHLS-based method can be adaptive to the illumination and the dense multipath indoor environments which result in obtaining a higher indoor positioning accuracy.

Figure 11.

Performance comparison between the proposed indoor positioning and the state-of-the-art positioning methods.

Conclusion

We presented a scalable and efficient mobile camera-based localization system. To this end, we built a modified model of a feature that deeply combined HOG and LPQ, and jointly addressed the problem of limited computational capacity, as well as the required memory footprint. Moreover, we employed TC-OFDM indoor positioning system for supporting the coarse positioning knowledge related to the camera location. According to our test on the TUM and BUPT database, the indoor positioning based on the proposed algorithm is less than 1.5 m. Furthermore, the RMSE between estimated and ground-truth positioning results up to 1.25 m, which shows that our smartphone camera-based indoor positioning algorithm is precise and accuracy. In the future work, we will study the sub-meter indoor positioning algorithms based on the fusion of image and wireless signals.

Footnotes

Academic Editor: Gang Wang

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This project was sponsored by the National Key Research and Development Program (no. 2016YFB0502002), the National High Technology Research and Development Program of China (no. 2015AA124103), the National Natural Science Foundation of China (no. 61401040), and Beijing University of Posts and Telecommunications Young Special Scientific Research Innovation Plan (2016RC13).

References

Gonzalez

Hidalgo

Barabasi

AL.

Understanding individual human mobility patterns. Nature 2008; 453(7196): 779–782.

Liu

Darabi

Banerjee

. Survey of wireless indoor positioning techniques and systems. IEEE T Syst Man Cy C 2007; 37(6): 1067–1080.

Lee

Chang

Park

. Indoor positioning system based on incident angles of infrared emitters. In: Proceedings of the 30th annual conference of IEEE Industrial Electronics Society, Busan, South Korea, 2–6 November 2004, pp.2218–2222. New York: IEEE.

Gallagher

Dempster

. How feasible is the use of magnetic field alone for indoor positioning? In: Proceedings of the international conference on indoor positioning and indoor navigation (IPIN), Sydney, NSW, Australia, 13–15 November 2012, pp.1–9. New York: IEEE.

Zou

Jiang

. An online sequential extreme learning machine approach to wifi based indoor positioning. In: Proceedings of the 2014 IEEE World Forum on Internet of Things (WF-IoT), Seoul, Korea, 6–8 March 2014, pp.111–116. New York: IEEE.

Huang

Lee

. Real-time rfid indoor positioning system based on Kalman-filter drift removal and heron-bilateration location estimation. IEEE T Instrum Meas 2015; 64(3): 728–739.

Zhongliang

Yanpei

Xie

. Situation and development tendency of indoor positioning. China Commun 2013; 10(3): 42–55.

Politi

Yong

Faisal

. Locata: A new technology for high precision positioning. In: European navigation conference, Naples, 3–6 May 2009. Institute of Navigation.

Kalkusch

Lidy

Knapp

. Structured visual markers for indoor pathfinding. In: Proceedings of the 1st IEEE international workshop on augmented reality toolkit, Darmstadt, 29 September 2002, p.8. New York: IEEE.

10.

Van Opdenbosch

Schroth

Huitl

. Camera-based indoor positioning using scalable streaming of compressed binary image signatures. In: Proceedings of the IEEE international conference on image processing (ICIP), Paris, 27–30 October 2014. New York: IEEE.

11.

Ravi

Shankar

Frankel

. Indoor localization using camera phones. In: Proceedings of the 7th IEEE workshop on mobile computing systems and applications (WMCSA’06), Semiahmoo Resort Blaine, WA, USA, 1 August 2005, p.49. New York: IEEE.

12.

Kim

Jun

Vision-based location positioning using augmented reality for indoor navigation. IEEE T Consum Electr 2008; 54(3): 954–962.

13.

Werner

Kessel

Marouane

Indoor positioning using smartphone camera. In: Proceedings of the international conference on indoor positioning and indoor navigation (IPIN), Sydney, NSW, Australia, 21–23 September 2011, pp.1–6. New York: IEEE.

14.

Wang

. Image matching techniques for vision-based indoor navigation systems: performance analysis for 3d map based approach. In: Proceedings of the international conference on indoor positioning and indoor navigation (IPIN), Sydney, NSW, Australia, 13–15 November 2012, pp.1–8. New York: IEEE.

15.

Heikkilä

Rahtu

Ojansivu

. Local phase quantization for blur insensitive texture description. In: Brahnam

Jain

Nanni

. (eds) Local binary patterns: new variants and applications. Berlin: Springer, 2014, pp.49–84.

16.

Zachariah

Jansson

. Fusing visual tags and inertial information for indoor navigation. In: Proceedings of the 2012 IEEE/ION position location and navigation symposium (PLANS), Myrtle Beach, SC, 23–26 April 2012, pp.535–540. New York: IEEE.

17.

Wagner

Reitmayr

Mulloni

. Pose tracking from natural features on mobile phones. In: Proceedings of the 7th IEEE/ACM international symposium on mixed and augmented reality (ISMAR), Cambridge, UK, 15–18 September 2008, pp.125–134. New York: IEEE.

18.

Mainetti

Patrono

Sergi

. A survey on indoor positioning systems. In: Proceedings of the 22nd international conference on software, telecommunications and computer networks (SoftCOM), Split, 17–19 September 2014, pp.111–120. New York: IEEE.

19.

Dalal

Triggs

. Histograms of oriented gradients for human detection. In: Proceedings of the IEEE Computer Society conference on computer vision and pattern recognition (CVPR), San Diego, CA, 20–25 June 2005, pp.886–893. New York: IEEE.

20.

Pang

Yuan

. Efficient HOG human detection. Signal Process 2011; 91(4): 773–781.

21.

Zhu

Yeh

Cheng

. Fast human detection using a cascade of histograms of oriented gradients. In: Proceedings of the IEEE Computer Society conference on computer vision and pattern recognition, New York, 17–22 June 2006, pp.1491–1498. New York: IEEE.

22.

Leibo

Liao

Poggio

. Subtasks of unconstrained face recognition. In: Proceedings of the international joint conference on computer vision, imaging and computer graphics (VISAPP), Lisbon, 5–8 January 2014. Setbal: SCITEPRESS.

23.

Wei

Guo

Wang

. A multiscale method for HOG-based face recognition. In: Liu

Kubota

Zhu

. (eds) Intelligent robotics and applications. Berlin: Springer, 2015, pp.535–545.

24.

Saleh

Dontcheva

Hertzmann

. Learning style similarity for searching infographics. In: Proceedings of the 41st graphics interface conference, Halifax, NS, Canada, 3–5 June 2015, pp.59–64. Toronto, ON, Canada: Canadian Information Processing Society.

25.

Wang

Han

Yan

An HOG-LBP human detector with partial occlusion handling. In: Proceedings of the 12th IEEE international conference on computer vision, Kyoto, Japan, 29 September–2 October 2009, pp.32–39. New York: IEEE.

26.

Pedone

Heikkila

. Local phase quantization descriptors for blur robust and illumination invariant recognition of color textures. In: Proceedings of the 21st international conference on pattern recognition (ICPR), Tsukuba, Japan, 11–15 November 2012, pp.2476–2479. New York: IEEE.

27.

Ojansivu

Rahtu

Heikkila

. Rotation invariant local phase quantization for blur insensitive texture analysis. In: Proceedings of the 19th international conference on pattern recognition (ICPR), Tampa, FL, 8–11 December 2008, pp.1–4. New York: IEEE.

28.

Engel

Sturm

Cremers

. Camera-based navigation of a low-cost quadrocopter. In: Proceedings of the 2012 IEEE/RSJ international conference on intelligent robots and systems, Vilamoura, 7–12 October 2012, pp.2815–2821. New York: IEEE.

29.

Kadir

Zisserman

Brady

An affine invariant salient region detector. In: Pajdla

Matas

(eds) Computer Vision-ECCV 2004. Berlin: Springer, 2004, pp.228–241.

30.

Achanta

Shaji

Smith

. Slic superpixels compared to state-of-the-art superpixel methods. IEEE T Pattern Anal 2012; 34(11): 2274–2282.

31.

Liu

. A robust local sparse coding method for image classification with histogram intersection kernel. Neurocomputing 2016; 184: 36–42.

32.

Vondrick

Khosla

Malisiewicz

. Hoggles: Visualizing object detection features. In: Proceedings of the 2013 IEEE international conference on computer vision (ICCV), Sydney, NSW, Australia, 1–8 December 2013, pp.1–8. New York: IEEE.

33.

Dhall

Asthana

Goecke

. Emotion recognition using PHOG and LPQ features. In: Proceedings of the 2011 IEEE international conference on automatic face gesture recognition and workshops (FG 2011), Santa Barbara, CA, 21–25 March 2011, pp.878–883. New York: IEEE.