Sage Journals: Discover world-class research

Abstract

We present a low cost battery-powered 6-degree-of-freedom wireless wand for 3D modeling in free space by tri-axis Magnetic, Angular Rate, Gravity (MARG) and vision sensor fusion. Our approach has two stages of sensor fusion, each with different algorithms for finding 3D orientation and position. The first stage fusion algorithm, a complementary filter, utilizes MARG sensors to compute 3D orientation relative to the direction of gravity and earth's magnetic field in a quaternion format, which was adjusted with compensations for magnetic distortion. The second stage fusion algorithm, a Kalman filter, utilizes accelerometer data and IR marker velocity to compute 3D position. In order to compute the IR marker linear velocity along the optical axis (the z-axis), we present a simple and efficient image-based technique to find the distance of the object from the camera using blob area pixels in the image. Our fusion (inside-in and outside-in) approach efficiently solves short time occlusion, needs of frequent calibration, and unbounded drift problems involved in numerical integration of inertial sensors data and improves the degrees of freedom at low cost without compromising accuracy. The results are compared with a leading commercial magnetic motion tracking system to demonstrate the performance of the wand.

1. Introduction

There has been increasing research over the last decade in using 6DOF motion tracking devices for 3D spatial sketching and modeling [1–4] in immersive virtual reality (VR) environments. Other efforts include 2D tablet screens used to draw a 2D sketch that is processed into 3D designs [5, 6]. The objective of practical spatial drawing and editing in 3D demands low-cost, precision, small size, and ease of use. Existing professional motion tracking systems that use electromagnetic, ultrasonic, optical, inertial, and multiple-sensor technologies [7] are too expensive for commercial 3D immersive VR and modeling and require a degree of technical knowledge to use them. Outside-in stereo vision has been widely used for 3D modeling, but this system often suffers from occlusion and interference and apparent loss of DOF. Any accidental change in the position of a camera after calibration requires complete recalibration [8].

3D motion-based human computer interaction (HCI) has long been an active research topic in VR, and it has been shown that 3D interfaces can be useful in many consumer-level applications such as home gaming [9] and 3D user input [10–12]. Emerging demands for rich interaction have led to the development of handheld pointing motion interface devices [13, 14]. These commercial devices incorporate micro-electro-mechanical system (MEMS) inertial sensors such as accelerometers and gyroscopes, and their contributions are limited to gesture recognition, rotation, and vision sensing as for 3D position. These devices are aimed to interact with 3D digital media content and motion gaming and are unsuitable for 3D modeling and editing in free space, which requires precise 6DOF motion sensing.

The main technological bottleneck that limits accuracy in computing position and orientation from MEMS inertial sensors is the drift caused by numerical integration of acceleration and angular rate [15–17]. However, inertial sensors are well known for their short term precision, high-frequency data rates, and size. To leverage these advantages, benefits of sensor fusion techniques using additional sensors have been proposed by researchers in the areas of navigation [18, 19] and motion capture [20].

Another recent innovative work, called MEMSEye, uses MEMS-mirror-based optical 3D tracking [21]. Combination of two or more MEMSEye units can track light sources such as IR light-emitting diodes (LED) and corner cube retroreflectors (CCRs). By triangulating the tracked object's relative position from each unit, its 3D position can be computed in relatively large volumes with submillimeter precision at update rates of >20 kHz. However, a fully functional unit using this technology platform costs more than one thousand dollars, making it unaffordable for a wide range of users.

With increasing interest in 3D display devices, a simple and low-cost solution that can provide enough precision and flexibility is not yet available for 3D modeling. Our work aims to advance 6DOF motion sensing by two-stage multisensor (MARG and vision) fusion to make use of their complementary properties, which was inspired by multisensor navigation systems. So it is essential to acquire accurate timestamp information to synchronize MARG measurements and the camera. In order to do that, we use a hardware triggerable IR camera.

This paper is organized as follows. Section 2 gives the description of the proposed system design and the details of the wireless wand architecture. Section 3 explains first stage sensor fusion for 3D orientation using MARG sensors and the magnetic distortion correction technique. In order to find the linear velocity of the IR marker along the camera z-axis, we explain our image-based technique in Section 4, followed by a description of the second stage sensor fusion, and performance comparison results with a leading commercial motion tracking system in Sections 5 and 6, respectively. Section 7 concludes our approach. Throughout the paper, a notation system of leading superscripts and subscripts similar to [22] is used to denote the relative frame of orientations and vectors. For example, ${}_{b}^{e}{q_{t}}$ describes the quaternion of sensor body frame b relative to the earth frame e at time t, and ${{}^{e}A}_{est}$ is an estimated vector described in frame e. The ⊗ operator denotes a quaternion product.

2. System Design

2.1. System Overview

The wireless wand is designed to be used in front of a computer monitor with a camera attached on top of it as shown in Figure 1(a). We use a wide field of view camera which has an IR filter and $640 \times 480$ resolution to capture the spherical IR marker located at the tip of the wand at 50, 75, and 100 frames per second (FPS). The wand is equipped with triaxis MARG sensors, temperature sensor, and a microcontroller communicates with them to gather a full set of sensor data, from which it computes 3D orientation that is sent in data packets to the computer via wireless Bluetooth link. Data packets contain the first stage sensor fusion output, 3D orientation in quaternion format, raw and calibrated individual sensor data, and buttons and LEDs status information as shown in Figure 2(a). A server program running in the computer communicates with both the wand and the camera as shown in Figure 2(b). In server program, an image-based blob tracking module uses thresholding technique for camera images to find the location and area of the IR marker in image in terms of pixels. The data preprocessing module uses our image-based technique which is explained in Section 4 for finding IR marker velocity along the optical axis (z-axis) of the camera using the area of the marker from the blob tracking result. Also, a technique to calculate gravity vector from the quaternion is incorporated into preprocessing module as MEMS triaxis accelerometer senses gravity plus translational acceleration of wand. The sensor fusion algorithm in Figure 2(b) has a 9-state Kalman Filter (KF) algorithm to compute 3D position. The inputs to the KF are triaxis translational acceleration and velocity vectors from preprocessing module. The server program (API updater) implemented in C++ serves 3D position, orientation, and buttons and LEDs statuses to any application for 3D interaction and modeling. The following section presents the architecture of the wand in detail.

Figure 1

System setup to work with wireless wand; (a) working volume of wand in front of camera and different frame of references involved; (b) wand.

Figure 2

Data packet and API: (a) data packet format sent from wand; (b) API to communicate and process data from the wand and camera.

2.2. Wand Architecture

The wand shown in Figure 1(b) has a triaxis digital 16-bit gyroscope, 12-bit accelerometer, and 12-bit magnetometer, each with its own respective selectable ranges of up to ±2000°/s, ±8 g, and $\pm 8.1$ G. Also, a digital 16-bit thermometer is incorporated into the wand for adaptive compensation of time varying temperature biases in the MEMS sensors. All the sensors, buttons, status LEDs, and Bluetooth transceivers are connected to a microcontroller for collecting, controlling, and processing data as shown in Figure 3. Wand's firmware incorporates MARG sensors calibration routines and data for computing 3D orientation in order to compensate for sensors biases.

Figure 3

Wand architecture.

3. MARG Sensor Fusion for 3D Orientation

To compute drift-free measurement of 3D orientation relative to the direction of gravity and earth's magnetic field, researchers proposed several algorithms using MARG sensors [23–26], also known as an attitude heading reference system (AHRS). A complementary filter using low-cost MEMS inertial measurement unit (IMU) with magnetometer was proposed [23] with deep mathematical basis to compute 3D orientation in a direction cosine matrix (DCM) and quaternion form [24]. Though this algorithm showed how a magnetometer can be used along with an IMU (gyroscope and accelerometer) to compute 3D orientation relative to the earth direction of gravity and magnetic field, it was not able to correct drift as was intended due to lack of a compensation technique for magnetic distortions resulting from nearby sources such as metal structures or power supply buses. Several investigations [25, 27] have shown that substantial errors may be introduced by magnetic distortions in orientation estimated from MARG sensors. By adapting a technique proposed in [22] for compensating for magnetic distortions (termed soft iron errors), a complementary filter algorithm in quaternion form has been implemented on a low-power hardware board in the wand system.

The potential advantage of this algorithm is in correcting drift in orientation computed from gyroscope measurements using an additional reference orientation computed from the accelerometer and magnetometer, by successfully incorporating magnetic distortion compensation without any singularity problems. One more key advantage of this technique is that it eliminates the need for the direction of earth's magnetic field to be predefined, which has been a potential disadvantage of other algorithms [25, 26]. The block diagram shown in Figure 4 represents the first stage sensor fusion AHRS algorithm, in which red color box indicates the magnetic distortion compensation technique. More details of the algorithm are given in the Appendix. The compensation technique is described as follows:

\begin{matrix} {}^{b}{M_{t}} = [\begin{bmatrix} 0 & m_{x} & m_{y} & m_{z} \end{bmatrix}], {}_{e}^{b}{q_{t}} = [\begin{bmatrix} q_{1} & q_{2} & q_{3} & q_{4} \end{bmatrix}], \end{matrix}

(1)

\begin{matrix} {}^{e}{H_{t}} = [\begin{bmatrix} 0 & h_{x} & h_{y} & h_{z} \end{bmatrix}] = {}_{e}^{b}{q_{t - 1}} \otimes {}^{b}{M_{t}} \otimes {}_{e}^{b}{q_{t - 1}^{*}}, \end{matrix}

(2)

\begin{matrix} {}^{e}{B_{t}} = [\begin{bmatrix} 0 & \sqrt{h_{x}^{2} + h_{y}^{2}} & 0 & h_{z} \end{bmatrix}], \end{matrix}

(3)

where ${}^{b}{M_{t}}$ is the normalized magnetic field vector, computed from the output of magnetometer of the wand body frame at time t. ${}_{e}^{b}{q_{t - 1}}$ and ${}_{e}^{b}{q_{t - 1}^{*}}$ are the normalized quaternion output of the algorithm and its conjugate at previous time step $t - 1$ , respectively. In (2), the direction of earth's magnetic field ${}^{e}{H_{t}}$ is computed, which may represent erroneous inclination and can be corrected if algorithm's reference direction of earth's magnetic field ${}^{e}{B_{t}}$ is of the same inclination. Equation (3) ensures that any magnetic disturbances are limited to only affect the estimated heading component of orientation.

Figure 4

Complementary filter block diagram.

4. IR Marker Velocity Tracking

In order to compensate for the numerical drift that results from double integration of acceleration measurements of triaxis accelerometer, we compute 3D velocity of the IR marker from the output of a blob tracking module, which provides the width, height, area, and 2D position $({}^{c}{p_{x}}, {}^{c}{p_{y}})$ of the IR marker in pixels after calibrating the image for distortion correction. Now, finding velocities ${}^{c}{v_{x}}$ and ${}^{c}{v_{y}}$ using position is straightforward in x- and y-axis directions of the camera frame of reference $O_{c}$ , as follows:

\begin{matrix} {}^{c}{v_{x}} = \frac{{}^{c}{p_{x | t - 1}} - {}^{c}{p_{x | t}}}{Δ t}, {}^{c}{v_{y}} = \frac{{}^{c}{p_{y | t - 1}} - {}^{c}{p_{y | t}}}{Δ t}, \end{matrix}

(4)

where $({}^{c}{p_{x | t - 1}}, {}^{c}{p_{y | t - 1}})$ and $({}^{c}{p_{x | t}}, {}^{c}{p_{y | t}})$ are the 2D positions in camera frame of reference at the previous and present time step, respectively, and $Δ t = 1 / FPS$ . There are two methods to find object position in the z-axis using images: stereovision and monovision. We propose a simple and efficient experimental method to find object distance using an object's area in a single image, and using this distance we find the velocity of a marker along z-axis.

Finding the distance of a specific-shaped object using a single image has been proposed in [28, 29]. Object's height in a thresholded binary image has been used to determine its distance from the camera using rectangular, triangular, cylindrical, and spherical shaped objects. However, object's size was relatively large, with minimum diameter of 0.65 m for a spherical object. For our experiment, we used an industry-standard IR camera with uniform radiation capability to illuminate a retroreflective spherical marker with 0.01 m diameter. Figure 5 shows the experimental setup used to determine how the IR marker object height and area pixels in the image change with varying distance from the optical center of the camera in the z- and x- axes. Initially, the marker is positioned exactly at the optical center of the camera and is moved away from the camera on a 2-axis linear rail system.

Figure 5

Experimental setup to observe object height and area.

It is found that the object height and area decrease exponentially with increasing distance (0.36 m to 1.5 m) from the camera along the z-axis, as shown in Figures 6(a) and 6(b), respectively. But, from Figure 6(a), it is clear that the object height does not change continuously with increasing distance when compared to the area of the object; this is the main reason for not using methods presented in other previous schemes [29]. It is also observed that the area measurements are repeatable at any particular intensity, exposure, and threshold settings of the camera. However, in order to calculate linear velocity, it is essential to linearize

\begin{matrix} d = - 0.0232577929 \log (a) + 6.7653875804, \end{matrix}

(5)

\begin{matrix} {}^{c}{v_{z}} = \frac{(d_{t - 1} - d_{t}) k}{Δ t} . \end{matrix}

(6)

Figure 6(b) is done by taking the logarithm. Analysis of Figure 6(c) leads to the fact that the object depth has a direct relationship with its logarithmic pixel area. Now, we can find the best fitting linear polynomial by linear regression to find the object distance to the camera, which is given in (5), where d is object distance and a is the object area in pixels. Having object distance, we can find the velocity along z-axis using (6), where k is a constant (scaling factor) determined by observation to get velocity in m/s units.

Figure 6

Object height and area with increasing distance from camera.

However, at a particular distance from the camera, when the marker moves laterally to the camera (perpendicular to optical axis) from camera center in either x- or y-axis direction, the object shape in image loses circularity, and the object area is not constant; it varies as shown in Figure 7. This significantly affects the velocity ${}^{c}{v_{z}}$ even if the marker moves only in the x- and y-axis. For this, instead of finding an analytical method based on blob tracking results to solve this problem, we exploit sensor fusion to compensate for the error introduced in the measured z-axis velocity.

Figure 7

Object area at different depth when object moved laterally to camera from its center.

The inertial navigation module shown in Figure 10 computes position and velocity estimates using translational accelerations and corrections obtained from a KF algorithm. The estimated z-axis velocity from this module, $v_{z est}$ , can be taken as a corrective reference for two reasons: estimated velocity does not have significant integral drift because correction from the KF is used to compute it; and next, the frame rate of camera is less than the update rate of the sensor fusion algorithm (equal to the sampling rate of inertial sensors). Based on $v_{z e s t}$ , lower and upper thresholds $t_{l}$ and $t_{u}$ , a discriminating window is applied for ${}^{c}{v_{z}}$ to compute valid z-axis velocity ${}^{c}{v_{z}^{'}}$ using (7). Where thresholds $t_{l}$ and $t_{u}$ are determined by

\begin{matrix} {}^{c}{v_{z}^{'}} = {\begin{cases} 0 & if v_{z est} > t_{l}, v_{z est} < t_{u}, \\ \frac{v_{z est} + {}^{c}{v_{z}}}{2} & otherwise, \end{cases} \end{matrix}

(7)

\begin{matrix} {}^{c}V = [\begin{bmatrix} {}^{c}{v_{x}} & {}^{c}{v_{y}} & {}^{c}{v_{z}^{'}} \end{bmatrix}] \end{matrix}

(8)

observing $v_{z est}$ , which implies that below these thresholds $v_{z est}$ has only integral drift resulting from noise in the translational acceleration. Figure 8 shows the velocity graphs for the marker moving back and forth in the x, y, and z directions, before and after applying (7). If $v_{z est}$ falls below both thresholds, ${}^{c}{v_{z}}$ is treated as error and is nullified; otherwise we take the mean of both, which is better than trusting noisy ${}^{c}{v_{z}}$ . Figures 8(a) and 8(b) show the affected ${}^{c}{v_{z}}$ and estimated velocity $v_{z est}$ when the marker is moved back and forth along the x- and y-axes, and their corrected counterparts in Figures 8(d) and 8(e). Similarly, Figure 8(c) shows velocities when the marker is moved along z-axis and its corrected counterpart in Figure 8(f), which shows effectiveness of (7) in reducing noise.

Figure 8

Before and after correction of ^cv_z: (a) error introduced in ^cv_z when marker moved along x-axis; (d) after correcting error in ^cv_z; (c) error introduced in ^cv_z when marker moved along y-axis; (e) after correcting error in ^cv_z.

5. Sensor Fusion for 3D Position

The MEMS triaxis accelerometer measures acceleration of the wand in the body (moving) frame of reference $O_{b}$ and has two components, translational acceleration (actual acceleration), and gravity, which is a function of the 3D orientation of the sensor. There are different ways to remove gravity from acceleration [14, 30]. In this section we first explain gravity removal from acceleration which is a part of the data preprocessing module of Figure 2. The next step is a KF sensor fusion algorithm to find 3D position from the translational acceleration vector and velocity vector ${}^{c}V$ .

5.1. Gravity Removal in Acceleration

A conditional offset filter [14] to remove gravity from acceleration may be sufficient when its output is used only for gesture recognition, but is not optimal in terms of responsiveness (depends on past data) and accuracy (gravity still persists in transition regions). This affects accuracy when our interest is precise position computation from translational acceleration. The following method, using a quaternion ${}_{b}^{e}{q_{t}}$ , computes gravity according to (9) and (10). Now, using (11), we can remove gravity from the acceleration vector ${}^{b}A$ to find the translational acceleration vector ${}^{b}{A_{r}}$ of moving $O_{b}$ . Figure 9(a) shows the acceleration of the wand in a stationary state A and arbitrary complex rotation regions B and C. Figure 9(b) indicates how close the computed gravity is to the accelerations. The result after gravity removal in the respective regions is shown in Figure 9(c), where region A contains only noise, and other parts with magnitude greater than region A contain translational accelerations occurring during rotation of wand. Now, body frame translational accelerations have to be transformed to $O_{e}$ according to strap-down kinematics theory using (12). Equation (13) converts ${}^{e}A$ to units of $m / s^{2}$ .

Figure 9

Gravity removal in acceleration: (a) acceleration; (b) gravity vector computed; (c) resulting translational acceleration after gravity removed.

Figure 10

Combined tracking.

Consider the following:

\begin{matrix} {}_{e}^{b}{q_{t}^{- 1}} = [\begin{bmatrix} q_{1}^{'} & q_{2}^{'} & q_{3}^{'} & q_{4}^{'} \end{bmatrix}] = \frac{{}_{b}^{e}{q_{t}^{*}}}{{| {}_{b}^{e}{q_{t}} |}^{2}}, \end{matrix}

(9)

\begin{matrix} G = [\begin{bmatrix} 2 (q_{2}^{'} q_{4}^{'} - q_{1}^{'} q_{3}^{'}) \\ 2 (q_{1}^{'} q_{2}^{'} + q_{3}^{'} q_{4}^{'}) \\ {(q_{1}^{'})}^{2} - {(q_{2}^{'})}^{2} - {(q_{3}^{'})}^{2} + {(q_{4}^{'})}^{2} \end{bmatrix}], \end{matrix}

(10)

\begin{matrix} [\begin{bmatrix} a_{t x} & a_{t y} & a_{t z} \end{bmatrix}] = {}^{b}A - G^{T}, {}^{b}A = [\begin{bmatrix} a_{x} & a_{y} & a_{z} \end{bmatrix}], \end{matrix}

(11)

\begin{matrix} [\begin{bmatrix} 0 & a_{t x}^{'} & a_{t y}^{'} & a_{t z}^{'} \end{bmatrix}] = {}_{b}^{e}{q^{*}} \otimes [\begin{bmatrix} 0 & a_{t x} & a_{t y} & a_{t z} \end{bmatrix}] \otimes {}_{b}^{e}q, \end{matrix}

(12)

\begin{matrix} {}^{e}A = [{}^{e}A] 9.81, {}^{e}A = [\begin{bmatrix} a_{t x}^{'} & a_{t y}^{'} & a_{t z}^{'} \end{bmatrix}] . \end{matrix}

(13)

5.2. Combined Tracking

A block diagram of combined tracking is shown in Figure 10 which serves as second stage sensor fusion algorithm. A 9-state Kalman filter incorporated is the heart of this algorithm. The KF is an efficient recursive filter algorithm that provides optimal estimates of system states from noisy observation data given the underlying model of the system and assumes all the errors and measurements have zero mean white Gaussian noise. It is also well known that to compensate for errors of inertial navigation systems, inertial sensors have to be assisted by other sensors, and use of a KF is common for fusing data from different sensors. Also KF is used in different applications, for example, moving target tracking in a video [31]. We propose fusion of high sampling rate MARG sensors and low frame rate vision sensor in second stage sensor fusion for 3D position tracking, which benefits from their complementary characteristics. The state variables of the KF are position, velocity, and translational acceleration.

The inertial navigation computing module estimates position and velocity from translational acceleration vector ${}^{e}A$ and corrections obtained from the filter at a previous time step. The KF takes these estimates and velocity vector ${}^{c}V$ to find the optimal estimates for position, velocity, and acceleration. The filter architecture has the following system dynamics and measurement model:

\begin{matrix} X_{k} = F_{k - 1} X_{k - 1} + w_{k - 1}, w_{k} ~ N (0, Q_{k}), \\ Z_{k} = H_{k} X_{k} + v_{k}, v_{k} ~ N (0, R_{k}) . \end{matrix}

(14)

Time Update. The state estimate and error covariance are propagated based on the optimal estimation at previous time step $k - 1$ .

Consider the following:

\begin{matrix} {\hat{x}}_{k} = F x_{k - 1}, \end{matrix}

(15)

\begin{matrix} {\hat{P}}_{k} = F P_{k - 1} F^{T} + Q . \end{matrix}

(16)

For (15) we use estimates of inertial navigation computing task and optimal states at the previous time step $k - 1$ . The symbol (^∧) represents predictions, $x_{k}$ is state vector, and F and Q are state transition and process noise covariance:

\begin{matrix} x_{k} = {[\begin{bmatrix} p_{x} & v_{x} & a_{x} & p_{y} & v_{y} & a_{y} & p_{z} & v_{z} & a_{z} \end{bmatrix}]}^{T}, \\ F = [\begin{bmatrix} F_{i} & 0_{3} & 0_{3} \\ 0_{3} & F_{i} & 0_{3} \\ 0_{3} & 0_{3} & F_{i} \end{bmatrix}], F_{i} = [\begin{bmatrix} 1 & Δ t & \frac{1}{2} Δ t^{2} \\ 0 & 1 & Δ t \\ 0 & 0 & 1 \end{bmatrix}], i = x, y, z, \\ Q = [\begin{bmatrix} Q_{i} & 0_{3} & 0_{3} \\ 0_{3} & Q_{i} & 0_{3} \\ 0_{3} & 0_{3} & Q_{i} \end{bmatrix}], Q_{i} = [\begin{bmatrix} \frac{Δ t^{5}}{20} & \frac{Δ t^{4}}{8} & \frac{Δ t^{3}}{6} \\ \frac{Δ t^{4}}{8} & \frac{Δ t^{3}}{6} & \frac{Δ t^{2}}{2} \\ \frac{Δ t^{3}}{6} & \frac{Δ t^{2}}{2} & Δ t \end{bmatrix}] q_{c} . \end{matrix}

(17)

Matrices of size $9 \times 9$ , $0_{3}$ represent a $3 \times 3$ zero matrix, $Δ t$ is the sampling period of the MARG sensors, and $q_{c}$ is the process noise covariance in continuous time.

Measurement Update. If measurements are available, this step incorporates those measurements in vector $z_{k}$ , by adjusting the state variables, generating an optimal estimate and uncertainty $P_{k}$ , using (18)–(20). $K_{k}$ is the Kalman gain ( $9 \times 9$ matrix), often called acceptability vector of $z_{k}$

\begin{matrix} K_{k} = {\hat{P}}_{k} H_{k} {(H_{k} {\hat{P}}_{k} H_{k}^{T} + R_{k})}^{- 1}, \end{matrix}

(18)

\begin{matrix} x_{k} = {\hat{x}}_{k} + K_{k} (z_{k} - {\hat{x}}_{k}), \end{matrix}

(19)

\begin{matrix} P_{k} = (I - K_{k} H_{k}) {\hat{P}}_{k} . \end{matrix}

(20)

Measures of $z_{k}$ , $H_{k}$ , and R are $9 \times 9$ measurement sensitivity and noise covariance matrices, respectively. Covariance calculated using (20) will be used for the next iteration. Matrices $z_{k}$ , $H_{k}$ , and $R_{k}$ are given below. Consider the following:

\begin{matrix} z_{k} = {[\begin{bmatrix} 0 & 0 & 0 & {}^{c}V & 0 & 0 & 0 \end{bmatrix}]}^{T}, \\ R_{k} = [\begin{bmatrix} R_{x} & 0_{3} & 0_{3} \\ 0_{3} & R_{y} & 0_{3} \\ 0_{3} & 0_{3} & R_{z} \end{bmatrix}], R_{i} = I_{3 \times 3} σ_{i}^{2}, σ_{i}^{2} = variance, \\ H_{k} = [\begin{bmatrix} H_{i} & 0_{3} & 0_{3} \\ 0_{3} & H_{i} & 0_{3} \\ 0_{3} & 0_{3} & H_{i} \end{bmatrix}], H_{i} = [\begin{bmatrix} 0 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 0 \end{bmatrix}] . \end{matrix}

(21)

Since MARG sensors are sampled at 120 Hz and the camera is sampled at 75 FPS (<MARG sensors data rate), during each time step, we check the inertial navigation computing module for estimates and marker velocity vector ${}^{c}V$ from the camera if they are available. If both are available, both time and measurement update steps are updated to get the current optimal estimate. If both are not available, a previous optimal estimate becomes our current optimal estimate. If estimates are available and ${}^{c}V$ is not ready, only predictions are updated and these become the current optimal estimates. If estimates are not available but ${}^{c}V$ is ready, predictions and measurement updates are both calculated, but the prediction equations use previous optimal estimates since new estimates are not available from the inertial navigation module.

6. Results

The proposed tracking system was tested against a leading commercial DC magnetic tracking system at 120 Hz. To do so, we set the update rate of the proposed tracking system at 120 Hz and the camera FPS at 75 and fixed them to a rigid platform to be moved by hand. The trajectories of translational motion obtained from both the systems along x-, y-, and z-axes are plotted in Figures 11(a), 11(b), and 11(c), respectively. Comparison shows the potential of our system in both static and dynamic arbitrary movements. A close inspection of Figure 11(c) also reveals the linearity and accuracy of measurements obtained from (5), (6), and (7). This shows that our simple and efficient idea presented in Section 4 can find the marker velocity along the camera z-axis using area pixels of IR marker in thresholded image, without the need for another camera or additional markers to track the 3D position of the object.

Figure 11

Comparison of trajectories of translational motion obtained from commercial tracker with our wand.

In order to compare the quaternion orientation data of the two systems, orientation with respect to their fixed, steady state quaternions were measured and then decoupled to Euler parameters describing the pitch φ, roll θ, and heading ψ corresponding to rotations around the body frame x-, y-, and z-axes, respectively. Figure 12 shows plots of 3D orientation obtained from the wands complementary filter for which magnetic distortion compensation incorporated. To show the performance of our wand, comparison of the 3D trajectories for helical motion in the earth frame of reference is presented in Figure 13.

Figure 12

Comparison of Euler angles computed from quaternion orientations obtained from commercial tracker with our wand.

Figure 13

Comparison of 3D trajectories obtained from wand with commercial tracker.

7. Conclusion

Motion tracking using MARG sensors with additional sensors is a mature field of research. Modern techniques [14–16] have focused on simpler fusion approaches on low power hardware to reach a wide range of users. We presented a simple and accurate approach for a wand system with two stage sensor fusion: the first stage of fusion offloads the 3D orientation computation from the computer, allowing focus on only 3D position computation in the computer as the camera is connected to it. The basic idea is to utilize low cost and wide field of view USB camera with IR filter to prevent numerical drift in the position computed from the acceleration of a MEMS accelerometer. Thus the overall system benefits from the complementary properties of inertial and vision sensing. Key advantages of the proposed system are (1) the working area of device, which allows user to interact with a computer or 3D TV at a comfortable distance by changing size of IR marker; (2) the small size and higher update rate; (3) the magnetic distortion compensation that helps to use the wand in challenging environments; and (4) another potential application of this device that includes air digital writing and signature verification.

Footnotes

Appendix

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

This research was supported by the Chung-Ang University Research Scholarship Grants in 2011.

References

Keefe

D. F.

Zeleznik

R. C.

Laidlaw

D. H.

Drawing on air: input techniques for controlled 3D line illustration

IEEE Transactions on Visualization and Computer Graphics 2007 13 5 1067 1081

2-s2.0-34548538801

10.1109/TVCG.2007.1060

Schkolne

Pruett

Schröder

Surface drawing: creating organic 3D shapes with the hand and tangible tools

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ′01)

April 2001

261 268

2-s2.0-0035040275

Nam

Chai

SPACESKETCH: shape modeling with 3D meshes and control curves in stereoscopic environments

Computers and Graphics 2012 36 5 526 533

2-s2.0-84860740284

10.1016/j.cag.2012.03.012

Kim

Lee

Chai

Real-time spatial surface modeling system usingwand traversal patterns of grid edges

IEICE Transactions on Information and Systems 2011 E94-D 8 1620 1627

2-s2.0-79961078080

10.1587/transinf.E94.D.1620

Kara

L. B.

Shimada

Sketch-based 3D-shape creation for industrial styling design

IEEE Computer Graphics and Applications 2007 27 1 60 71

2-s2.0-33846875453

10.1109/MCG.2007.18

Kim

D. H.

Kim

M.-J.

A new modeling interface for the pen-input displays

CAD Computer Aided Design 2006 38 3 210 223

2-s2.0-31344457517

10.1016/j.cad.2005.10.007

Welch

Foxlin

Motion tracking: no silver bullet, but a respectable arsenal

IEEE Computer Graphics and Applications 2002 22 6 24 38

2-s2.0-0036857027

10.1109/MCG.2002.1046626

Zhang

Flexible camera calibration by viewing a plane from unknown orientations

Proceedings of the 7th IEEE International Conference on Computer Vision (ICCV ′99)

September 1999

666 673

2-s2.0-0033284445

McMahan

R. P.

Gorton

Gresock

McConnell

Bowman

D. A.

Separating the effects of level of immersion and 3D interaction techniques

Proceedings of the 13th ACM Symposium Virtual Reality Software and Technology (VRST ′06)

November 2006

108 111

2-s2.0-34547404750

10.1145/1180495.1180518

10.

Komlódi

Józsa

Hercegfi

Kucsora

Borics

Empirical usability evaluation of the Wii controller as an input device for the VirCA immersive virtual space

Proceedings of the 2nd International Conference on Cognitive Infocommunications (CogInfoCom ′11)

July 2011

1 6

2-s2.0-80052880974

11.

Chun

Lee

A vision-based 3D hand interaction for marker-based AR

International Journal of Multimedia and Ubiquitous Engineering 2012 7 3 51 58

2-s2.0-84863655586

12.

Rougier

Meunier

3D head trajectory using a single camera

International Journal of Future Generation Communication & Networking 2010 3 4 43 54

13.

Wingrave

C. A.

Williamson

Varcholik

P. D.

Rose

Miller

Charbonneau

Bott

Laviola

J. J.

Jr.

The wiimote and beyond: spatially convenient devices for 3D user interfaces

IEEE Computer Graphics and Applications 2010 30 2 71 85

2-s2.0-77649295545

10.1109/MCG.2009.109

14.

Kim

Park

Yim

Choi

Jeon

Han

Gesture-recognizing hand-held interface with vibrotactile feedback for 3D interaction

IEEE Transactions on Consumer Electronics 2009 55 3 1169 1177

2-s2.0-70350300969

10.1109/TCE.2009.5277972

15.

Wang

J.-S.

Hsu

Y.-L.

Liu

J.-N.

An inertial-measurement-unit-based pen with a trajectory reconstruction algorithm and its applications

IEEE Transactions on Industrial Electronics 2010 57 10 3508 3521

2-s2.0-77956609153

10.1109/TIE.2009.2038339

16.

Zhu

Zhou

A small low-cost hybrid orientation system and its error analysis

IEEE Sensors Journal 2009 9 3 223 230

2-s2.0-60449096766

10.1109/JSEN.2008.2012196

17.

Bang

W.-C.

Chang

Kang

K.-H.

Choi

E.-S.

Potanin

Kim

D.-Y.

Self-contained spatial input device for wearable computers

Proceedings of the 7th IEEE International Symposium on Wearable Computers (ISWC ′05)

October 2005

26 34

2-s2.0-34047110392

18.

A. D.

Johnson

E. N.

Proctor

A. A.

Vision-aided inertial navigation for flight control

AIAA Journal of Aerospace Computing, Information and Communication 2005 2 9 348 360

2-s2.0-27644530932

19.

Parnian

Golnaraghi

Integration of a multi-camera vision system and strapdown inertial navigation system (SDINS) with a modified Kalman filter

Sensors Journal 2010 10 6 5378 5394

2-s2.0-77954780517

10.3390/s100605378

20.

Pons-Moll

Baak

Helten

Müller

Seidel

H.-P.

Rosenhahn

Multisensor-fusion for 3D full-body human motion capture

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR ′10)

June 2010

663 670

2-s2.0-77955986209

10.1109/CVPR.2010.5540153

21.

Milanović

Siu

Kasturi

Radojičić

MEMSEye for optical 3D position and orientation measurement

7930

MOEMS and Miniaturized Systems X

2011

Proceedings of SPIE

22.

Madgwick

S. O. H.

Harrison

A. J. L.

Vaidyanathan

Estimation of IMU and MARG orientation using a gradient descent algorithm

Proceedings of the IEEE International Conference on Rehabilitation Robotics (ICORR ′11)

July 2011

1 7

2-s2.0-80055059186

10.1109/ICORR.2011.5975346

23.

Mahony

Hamel

Pflimlin

J.-M.

Nonlinear complementary filters on the special orthogonal group

IEEE Transactions on Automatic Control 2008 53 5 1203 1218

2-s2.0-51749112903

10.1109/TAC.2008.923738

24.

Euston

Coote

Mahony

Kim

Hamel

A complementary filter for attitude estimation of a fixed-wing UAV

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS ′08)

September 2008

340 345

2-s2.0-69549107755

10.1109/IROS.2008.4650766

25.

Sabatini

A. M.

Quaternion-based extended Kalman filter for determining orientation by inertial and magnetic sensing

IEEE Transactions on Biomedical Engineering 2006 53 7 1346 1356

2-s2.0-33746855962

10.1109/TBME.2006.875664

26.

Marins

J. L.

Yun

Bachmann

E. R.

McGhee

R. B.

Zyda

M. J.

An extended Kalman filter for quaternion-based orientation estimation using MARG sensors

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems

November 2001

2003 2011

2-s2.0-0035558199

27.

Bachmann

E. R.

Yun

Peterson

C. W.

An investigation of the effects of magnetic variations on inertial/magnetic orientation sensors

Proceedings of the IEEE International Conference on Robotics and Automation (ICRA ′04)

May 2004

1115 1122

2-s2.0-3042529736

28.

Rahman

Salam

Islam

Sarker

An image based approach to compute object distance

International Journal of Computational Intelligence Systems 2008 1 4 304 312

2-s2.0-77952827984

10.2991/ijcis.2008.1.4.3

29.

Hasan

S. F.

Sadat

R. M. N.

Rahman

M. A.

Kabir

M. H.

A precise and low complexity distance and size measurement of circular objects from camera position using still images

Proceedings of the 4th International Conference on Electrical and Computer Engineering (ICECE ′06)

December 2006

439 442

2-s2.0-46249118399

10.1109/ICECE.2006.355664

30.

Tsang

C. C.

Chow

G. C. T.

Leong

P. H. W.

Zhang

Luo

Dong

Shi

Kwok

S. Y.

Wong

H. Y. Y.

W. J.

Wong

M. Y.

A novel real-time error compensation methodology for μIMU-based digital writing instrument

Proceedings of the IEEE International Conference on Robotics and Biomimetics (ROBIO ′06)

December 2006

Kunming, China

678 681

2-s2.0-46249111069

10.1109/ROBIO.2006.340288

31.

Suliman

Cruceru

Moldoveanu

Kalman filter based tracking in an video surveillance system

Advances in Electrical and Computer Engineering 2010 10 2 30 34

2-s2.0-77954639594

10.4316/aece.2010.02005