Abstract
Intensity-based 2D/3D registration is a key technique using digitally reconstructed radiographs (DRRs) to register the preoperative volume to the patient setup during the operation. Although DRR-based method provides a high accuracy, the small capture range hinders its clinical use. In this paper, such problem was addressed by a robust and fast initialization method using a two-level scheme including automatic tracking-based initialization (Level I) and multiresolution estimation based on central-slice theorem and phase correlation (Level II). It provided almost optimal transformation parameters for intensity-based registration. Experiments using a public gold standard data set and a spinal phantom have been conducted. The mean target registration error (mTRE) was limited in the range from 2.12 mm to 22.57 mm after tracking-based initialization. The capture range based on level II only was 20.1 mm and the mTRE in this capture range was 2.92 ± 2.21 mm. The intensity-based 2D/3D registration using proposed two-level initialization achieved the successful rate of 84.8% with the average error of 2.36 mm. The experimental results showed that the proposed method yielded the robust and fast initialization for intensity-based registration methods. In a similar way, it can be applied to other registration methods to enable a larger capture range and robust implementation.
1. Introduction
Image guided surgery (IGS) is a medical procedure, where 3D preoperative volume (e.g., CT/MRI) and 2D intraoperative X-ray images usually generated by a mobile C-arm are involved. Preoperative volume data (moving image) is used for diagnosis and surgical planning, while intraoperative images (fixed images) provide surgeon with updated information relative to the current pose of patient. By aligning these data, the planned trajectory of tools and ultimate target will be visualized with the latest information about region of anatomy, so that surgeons can confirm the surgical outcome in time [1]. Additionally, preoperative data provides the 3D information that 2D X-ray images lack. Therefore, 2D/3D registration is performed as a key technique in IGS to estimate the transformation between the coordinates of the preoperatively acquired data and the patient setup during the operation. Various algorithms were presented for 2D/3D registration [2–12], where intensity-based 2D/3D method is extensively investigated using information contained in voxels and pixels of 3D and 2D images, respectively. However, these methods suffer the misalignment caused by the initial transformation before the optimization of 2D/3D registration. Therefore, a good initialization is necessarily required to guarantee the successful rate and decrease the number of iteration (running time) of registration.
Currently, the field of intensity-based method is dominated by digitally reconstructed radiograph (DRR) based method, where multiple fixed images are captured and corresponding simulated X-ray projection images (DRRs) [13] are generated in different directions. Although DRR-based method was implemented with the high accuracy, a major issue is the small capture range which is defined as the 95% successful rate in the order of 2 mm mean target registration error (mTRE) [14]. Initial guess lying outside the capture range leads to the failure of registration. It is essentially a problem of initialization for 2D/3D registration. In this case, it is obviously impractical to make sure that initial parameters always lie in the capture range. Alternatively, another solution is to enlarge the capture range to increase the probability of successful initialization.
To provide a large capture range, Van der Bom et al. employed a coarse registration method in Fourier domain [4, 15] as a robust initialization method for intensity-based registration using the central-slice theorem [16] and phase correlation [17] to estimate the rotational pose of volume and translational offsets, respectively. In the remainder of this paper, this technique is referred to as the frequency estimation. It does not require the DRR generation, and the output provides a set of parameters around the final optimum; that is, less iterations are required to maximize the similarity measurement in intensity-based registration. However, the initial parameters of rotation estimation have to be set manually. Also according to the reported result in [18], the proposed method took 7.5 min per initialization and the successful rate of proposed method followed by intensity-based registration was only 68.6%, although it has been improved from 28.6%.
Inspired by Bom's work, the objective of this paper is to develop a more robust and faster initialization method in frequency domain in an automatic manner for the practical use of 2D/3D intensity-based registration. To achieve this, it is a great challenge to estimate the relative pose of intraoperative 2D images due to the instability of mobile C-arm commonly used in the scenario of IGS. Additionally, the initialization of frequency estimation becomes more important to improve the robustness and faster estimation is also required to reduce the processing time of whole registration procedure.
In this paper, a two-level initialization method is proposed including tracking-based initialization (Level I) and finer estimation based on multiresolution strategy in frequency domain (Level II). Level I yields the relative pose between 2D X-ray images and provides good initial parameters in an automatic way for Level II. There is no requirement of manual initialization for frequency estimation. In the second level, multiresolution strategy is used in frequency estimation based on wavelet decomposition. It allows the fast implementation and improves the robustness of estimation by starting registration from the coarse level. The main contributions of this paper include.
Automatic estimation of 2D/3D intensity-based registration is proposed based on tracking. Surgeons just take the intraoperative images at the distinct time; the following tasks can be done automatically to link the pre- and intraoperative data.
More robust and faster frequency estimation for intensity-based registration is achieved based on the wavelet pyramid in the Fourier domain.
The reminder of this paper is organized as follows. Section 3 introduces the required materials in experiments. In Section 4, a two-level initialization method is presented, including tracking-based initialization and frequency estimation. Experimental results are shown and discussed in Section 5. A public golden standard data set and spinal phantom are used for the verification of proposed method. Related work including C-arm pose tracking and pyramid representations is summarized in Section 2. Section 6 concludes this paper.
2. Related Work
A robust and fast initialization for intensity-based 2D/3D registration was performed in two steps, including tracking-based initialization and multiresolution rotation estimation. In this section, C-arm pose tracking and pyramid methods are discussed as two key techniques which are related to the contributions of this paper.
2.1. C-Arm Pose Tracking
In the image guided surgery, the mobile C-arm is commonly used as its compact structure and flexible functions. For 2D/3D registration, to estimate the relative pose between intraoperative X-ray images and position the C-arm at a desired pose, mobile C-arm has to be tracked, especially the X-ray tube.
The approaches to estimate the relative pose between X-ray images were mainly classified into two categories. One was using an X-ray opaque fiducial marker which was imaged together with the anatomy of interest. Several special structures were designed to estimate the pose of C-arm from the X-ray images [19–23]. In [19], a fiducial composed of a set of coplanar ellipses was used to track the C-arm since a 3D ellipse projects to an ellipse in the image. Chintalapani et al. [20] used the similar method and showed that a single ellipse and a point correspondence can achieve the estimation of C-arm pose. In [21, 22], a single-image-based fluoroscope tracking (FTRAC) method was presented using an external fiducial consisting of a set of ellipses, lines, and points, which were mathematically optimized. A hybrid fiducial was designed by Otake et al. [23]. It contains two parts, including one fiducial, that is, visible to the optical tracking, and the other one is FTRAC, which can be seen in the X-ray image. In [24], Reaungamornrat et al. presented an on-board surgical tracking system within which a video-based tracker was mounted on the gantry of a mobile C-arm next to the flat-panel detector. It allowed the tracking of object from various C-arm angulations. In the same way, the pose of mobile C-arm can be monitored in turn.
The other category for the C-arm tracking was using an external tracker. Electromagnetic and optical trackers are frequently used in the most of IGS applications [1, 25, 26]. However, sensitivity of electromagnetic trackers to the metal objects in the operating room can distort the electromagnetic field and reduce the accuracy. In this paper, optical tracker (Vicon motion tracking system) was used. The calibrated tracking volume is large enough to cover the whole working space, even though the mobile C-arm is moving.
2.2. Pyramid Representations
In order to avoid the false local optima, improve the robustness of convergence, and reduce the running time, a hierarchical multiresolution strategy (image pyramid) is usually used. Image pyramid is typically represented based on the two steps consisting of filtering and downsampling, for example, Gaussian pyramid. Another way to construct the pyramid is using wavelet transform, which is implemented in one step leading to less computing cost. Several methods of 2D/3D registration were reported using multiresolution strategy for optimization. In [27], Gaussian pyramid was used for an extended global optimization strategy by combining with a multiscale method. Xu and Chen[28] proposed a wavelet-based multiresolution strategy. It allowed a hybrid metric by combining the mutual information and spatial information from the low-frequency coefficient pair and three high-frequency coefficient pairs, respectively. Recently a multiresolution registration method was presented using firefly algorithm and Powell's method [10]. The multiresolution strategy was used based on wavelet transformation. The firefly algorithm was performed in the low level, while, in the higher level, Powell's method was adopted to obtain the finer results. Besides, Rohlfing et al. [7] proposed a progressive attenuation field (PAF) to generate DRR images, and multiresolution DRR image was obtained by sampling PAF parameters. However, all these methods were implemented in the intensity domain based on different pyramid representations.
In [9], a part of work was achieved in frequency domain to find a frequency band in which the information of bone anatomy was better expressed than nonbony anatomy. But the following processing was still performed in the intensity domain based on Gaussian pyramids for the optimization. In this paper, all processes were performed in Fourier domain. That is, the wavelet-based pyramids were frequency representations. Considering the convolution theorem in Fourier domain, the wavelet operator was more efficient than the implementation in the intensity domain.
3. Hardware Platform
In this paper, a mobile C-arm, Vicon motion tracking system, and spinal phantom are used. The mobile C-arm captured the intraoperative X-ray images at different projection angles. The relative pose of these images was tracked by Vicon motion tracking system, which also was used to estimate the relative pose of C-arm with respect to a spinal phantom. A spinal phantom was chosen in experiments to verify the performance of proposed method.
3.1. Mobile C-Arm
ARCADIS Varic (Siemens, Germany) was used as a nonisocentric mobile C-arm. It can produce X-ray image with the resolution of 1024 × 1024. The image spacing is 1 × 1 mm and the stored pixel type is unsigned short. Several infrared (IR) reflective markers are attached on the C-arm, so that the source of mobile C-arm can be tracked by motion tracking system, as shown in Figure 1. In this paper, only four markers on the source of mobile C-arm were used.

Infrared reflective markers on the mobile C-arm.
3.2. Motion Tracking System
Vicon motion tracking system (Vicon Inc., UK) consists of Vicon T-Series cameras that are the world's next generation motion capture devices. Each camera provides a 16 megapixels resolution (4704 × 3456) at the frame rate of 120 fps. The tracking system works with IR reflective markers that are different size of small plastic balls covered with reflective tape from 3M Inc., USA. When the infrared is projected to the surface of markers, the light with the same wavelength returns to the camera, so that each camera can get the 2D position of markers, then the 3D position can be obtained with submillimeter accuracy by the controller of Vicon. Proper size of makers is chosen in practice, according to the requirement of applications. The diameter of makers used in our experiments is 14 mm.
We have six cameras used for motion tracking in the experiment. To capture the markers’ position correctly, at least three cameras should see the same marker. Since we used six cameras, it is still possible to capture the required information of markers, even if there is occlusion. The sampling rate of Vicon can reach 200 fps. It is enough for the real applications.
3.3. Spinal Phantom
The spinal phantom (3B Scientific Inc., Germany) offers an illustration of lumbar vertebrae from L1 to L5 with intervertebral discs. The finest bone structures are accurately depicted. The model can be disassembled into vertebrae and intervertebral discs. The model is fixed in the pearl cotton, as shown in Figure 2. 11 IR reflective markers are attached to facilitate the process of calibration between the Vicon motion tracking system and preoperative CT data. The CT data is collected by a CT scanner (Aquilion ONE, Toshiba, Japan) with a 320-row detector. The resolution of each slice is 512 × 512 pixels and there are 227 slices in total. The CT volume covers the region of 0.559 × 0.559 mm with a 1 mm slice thickness.

Spinal phantom with IR markers.
4. Methods
The typical intensity-based 2D/3D registration is implemented with a small capture range and a low successful rate. In this paper, given two X-ray images of an anatomical structure and the corresponding CT data, the objective was to provide a robust and fast initialization for the practical use of intensity-based registration in an automatic way.
A two-level initialization scheme was proposed in this paper, including tracking-based initialization (Level I) and multiresolution frequency estimation (Level II). Level I yielded the relative pose of two X-ray images and provided coarse transformation parameters for Level II. In the second level, a finer estimation was obtained in the Fourier domain based on central-slice theorem and phase correlation. Especially, wavelet pyramids were used for the rotation estimation to enable the fast implementation and robust estimation.
4.1. Level I: Tracking-Based Initialization
In this section, the relative pose of X-ray images and a coarse initialization were estimated based on Vicon motion tracking system. The configuration of tracking-based initialization was presented in Figure 3, referred to as V-Tracker. Vicon motion tracking system was mounted on the ceiling of room. To enable that the pose of mobile C-arm can be tracked by the motion tracking system, several infrared reflective markers were attached on the X-ray source of C-arm. A spinal phantom was used in the experiments and the markers were also attached to calculate the coarse initial transformation. Based on the markers attached on the C-arm, it was easy to estimate the relative pose of X-ray images by tracking the focal points where X-ray images were captured.

Configuration of tracking-based initialization.
To estimate the coarse transformation of volume T in the CT coordinate system, the problem is formulated as
where
As shown in Figure 4, CCT, CVicon, and Csp are the coordinate systems of CT volume, Vicon motion tracking system, respectively. An off-the-shelf calibration method was employed to set CVicon using a 240 mm wand (5-Marker wand). The x-y plane is determined by these markers and the normal vector of this place is defined as z-axis. A local coordinate system Csp on spinal phantom is defined as the same as the CT frame. The origin is located at marker 8. The vector from marker 8 to marker 4 defines z-axis; the y-axis is defined by the normal vector of the place in the middle where markers 1, 2, 5, 6, 7, 10, 11 were placed. The cross product of y-axis and z-axis results in x-axis.

Illustration of the tracking-based initialization.
In our experiment, the transformation T is equal to identity matrix, when three conditions were satisfied.
The z-axis of Csp is parallel with the y-axis of CVicon.
x-axes of Csp and CVicon are in the opposite direction.
When the mobile C-arm is in the vertical position (detector-over-table [24]), the line connecting the X-ray source and center of image intensifier is perpendicular to the middle plane of spinal phantom and crosses through the center of plane.
Then
4.2. Level II: Multiresolution Rotation Estimation and Translation Estimation in Fourier Domain
Following the tracking-based initialization, a frequency method based on the multiresolution strategy is proposed to obtain more accurate and robust estimation in a faster way. As shown in Figure 5, Fourier transform is applied to the preoperative volume and X-ray images, followed by the wavelet-based pyramids, which allows a rough estimation using low-resolution image in the following optimization, and subsequently the parameters are refined when the resolution is increased gradually. The estimation at high resolution is initialized by a reasonable estimate from the coarse resolution. Rotation and translation estimation can be separated in Fourier domain. It will be explained in detail later. Firstly, the rotation estimation (procedures in the red dashed rectangle) is implemented iteratively based on the initial parameters from V-Tracker by starting from the coarse level of wavelet-based pyramid. In each iteration, the dimension of search space is reduced to 3, while it is 6 in the traditional intensity-based 2D/3D registration. Once the optimal rotation is obtained, translation vector is estimated in a noniterative way using phase correlation method.

Multiresolution estimation in Fourier domain.
To explain the separation of rotation and translation estimation, let Φ2 be a representation of Φ1 in 3D after a rigid transformation (R ∣ T) is applied. The relationship between Φ1 and Φ2 can be written as
Applying Fourier shift theorem to (3),
where Φ∘ is the result after the Fourier transform. (u,v,w) is a point in Fourier domain, and (t x ,t y ,t z ) are the elements in T. The modulus is expressed in term of rotation and the phase part is related to translation information. Therefore, we can estimate the rotation and translation separately. In this paper, the versor transformation in Insight Segmentation and Registration Toolkit (ITK) [30] is used to model the rotation. To simplify the translation estimation, X-ray images were captured in the anteroposterior (AP) view and lateral (LAT) view using the mobile C-arm in the experiments, and the desired views were achieved based on the Vicon motion tracking system.
4.2.1. Wavelet Pyramid in Fourier Domain
Multiresolution strategy in Fourier domain was the key concept in Level II and used to optimize the rotation parameters. In this paper, pyramids for 2D and 3D data were constructed based on wavelet transform (WT). Compared with Gaussian and Laplacian pyramid, wavelet pyramid is achieved in one step for less computing cost by merging filtering and downsampling steps together [28]. It is more efficient in Fourier domain especially [31]. According to the convolution theorem, wavelet transform is the multiplication of signals and the mother wavelet with a set of scales and translational values in frequency domain. Unlike the convolution operation, it allows the simultaneous computation.
In this paper, quadrature mirror filter (QMF) [32, 33] as a discrete orthogonal form of wavelet was used. Three levels of resolution were used, while the level for WT was set to 2. Figure 6 shows the pyramid of fixed image in Fourier domain. The right column is the origin image with the resolution of 410 × 410.

Wavelet pyramid of the image in golden standard data set [34].
4.2.2. Rotation Estimation
Central-slice theorem was applied to generate the projection images which were used to optimize the rotation parameters. It defines a rotational relationship between the Fourier spectrum of an N-dimensional function and Fourier spectra of its (N − 1) dimensional projections. In this paper, we will get 2D projections, when the preoperative data is in 3D. As shown in Figure 7, the parallel projection pθ(u′,v′) of an object Φ(x,y,z) on a plane Σ through the origin with angle θ relative to the x-axis evaluated at a distance r from the origin and angle φ relative to z-axis can be written as
where δ is the Dirac delta function. According to central-slice theorem,
where

Illustration of central-slice theorem in 3D.
In this paper, two X-ray images were used as fixed images for 2D/3D registration. To resample the slice in a specific direction through the origin of volume in the Fourier domain during the optimization, rotation matrix has to be constructed. Inspired by the camera transformation in OpenGL, rotation matrix is calculated as
where f is the direction of projection, u is the up vector, and s is the cross product of f and u. As shown in Figure 8, the rotation matrices in AP and LAT views can be constructed. In the AP view, the projection plane is parallel with x-z plane in the volume coordinate, while the image in LAT view is generated by projection along axis x. The relation between both matrices is

Geometry of AP and LAT views.
For each iteration of rotation optimization, two 2D images in Fourier domain were extracted from a specific level of wavelet pyramid to optimize the rotation in an iteration. Consider:
where
The similarity measure between projection slices and intraoperative X-ray images I can be expressed by combining both measures in AP and LAT views
Please note that Fourier transform is applied to I, since SliceAp and SliceLAT are presented in Fourier domain. In this paper, normalized gradient correlation was employed to calculate the similarity measure, and optimization was implemented by Powell's method.
4.2.3. Translation Estimation
Once the rotation is determined, we can estimate the translation according to (4) using the phase correlation method (PCM), which is a registration technique based on the Fourier shift theorem. Let SliceAP and SliceLAT be the resultant projection slices of rotation estimation in x-z and y-z plane, respectively. Substitute the results into (4),
Let us take the AP view, for example, the phase difference between SliceAP(u,w) and ÎAP(u,w) can be calculated using the cross-power spectrum C(u,v), defined as
where ÎAP*(u,w) is the complex conjugate of ÎAP(u,w). The inverse Fourier transform c(x,z) of C(u,v) is a Dirac delta function that indicates the relative shift (t x ,t z ). Consider:
A similar process can be carried out to estimate the difference (t y ,t z ) in the lateral view.
5. Experiments and Discussions
In this section, three sets of experiments were performed to validate the proposed two-level automatic initialization method. Multiresolution rotation estimation in Level II was firstly tested using DRR images as the fixed images to determine the improvement in speed and robustness using the wavelet pyramid. The second experiment was to evaluate the complete initialization combining both levels of initialization methods. In the last experiment, the proposed initialization method followed by DRR-based 2D/3D registration was tested to verify whether it can improve the successful rate of registration compared with the results in [18].
The experiments used the golden standard data set [34] and the CT data of spinal phantom acquired by the CT scanner (Aquilion ONE, Toshiba, Japan). Table 1 summarized the preoperative CT, fixed images, and ground truth in experiments. The specifications including the resolution and spacing of the data sets were listed in Table 2.
Summary of the data sets used in the experiments.
Specification of the data sets used in experiments.
5.1. Performance Measurements
To estimate the performance of the proposed method, we used mean target registration error (mTRE), capture range, running time, and successful rate as the performance measurements. The mTRE was defined as the Euclidean distance between fiducial markers in the preoperative volume transformed with the golden standard transformation (ground truth) and estimated transformation
where N is the total number of fiducial markers p n , T is the result from the initial guess or registration algorithm, and T g is the gold standard transformation. In the preoperative CT summarized in Table 1, there are seven and eleven markers, respectively, which can be used to calculate the mTRE. Particularly, with the target points in the golden standard data set, a rotation of 1.0° around one of the three axes will lead to an mTRE of 1.2–1.6 mm. Following Bom's work, the registration error before the multiresolution frequency estimation and the error after the estimation process are referred to as mTREstart and mTREestimate, respectively. mTREfinal is the final result after intensity-based registration.
The estimation with an mTRE smaller than 7.5 mm was considered successful, and the capture range was defined where the successful rate can reach 95% [14]. The running time was mainly used to quantify the efficiency of initialization method compared with Bom's method, which required 7.5 min per initialization.
In all experiments, Powell optimizer was used for the initial estimation with the same parameters for comparison of different methods. The tolerance of similarity measure was set to 0.001 and the maximum of iterations was 100. For multiresolution strategy, the tolerance of search step and step length were set to 0.08 and 8.0, respectively. When the resolution increased, both step parameters were decreased by 2. All experiments were implemented within ITK.
5.2. Test of Multiresolution Rotation Estimation
In this experiment, the method of multiresolution rotation estimation based on central-slice theorem was evaluated using the spinal phantom and golden standard data set, respectively. The projection images obtained with central-slice theorem are shown in Table 3. The spinal phantom was mainly used for rotation estimation using the projection images obtained with central-slice theorem and comparison of results with and without multiresolution strategy. The performance of rotation and translation estimation was evaluated using the golden standard data set. For both data sets, DRRs were generated with known various rotation and translation parameters as the ground truth to register to the corresponding CT data. With these experiments, mTRE, capture range, and running time of multiresolution rotation estimation were quantified.
Projection images obtained with central-slice theorem.
In the experiment using the spinal phantom, the ground truth of rotational parameters is R = (R x ,R y ,R z ) = (90°, − 90°, 0°), while the translation parameters were fixed at 0 mm. The initial parameters for rotation estimation vary in the ranges of [92.82°, 104.28°], [− 89.95°, − 78.5°], and [0°, 11.46°] for x-, y-, and z-axis, respectively. The same set of parameters was used to evaluate the performance without and with the multiresolution strategy. There were 250 trials performed in total in this experiment.
The results are shown in Figure 9. The red line in Figures 9(a) and 9(c) indicates the threshold for the successful estimation (7.5 mm) and the boundary of improvement is indicated by the blue line. Compared with the result using the method in Bom's work, multiresolution rotation estimation was more robust, especially when mTRE was less than 17.5 mm. Also, the running time was faster with the average time of 464.8 s for each estimation, while the 529.9 s was taken without using multiresolution strategy.

Rotation estimation based on multiresolution strategy using spinal phantom. (a) and (b) represent the mTRE and running time without using multiresolution strategy, respectively. (c) and (d) are the corresponding performances using multiresolution strategy.
Using the golden standard data set, multiresolution frequency estimation was performed, including rotation and translation estimation. Figure 10 shows the results with the rotation varying from − 15° to 15° and translation offsets in the range from −10 to 10 mm for all three axes. The results above the blue line indicate no improvement and the ones under the red line (7.5 mm) denote the successful estimation. The capture range of the frequency estimation was 20.1 mm. The mTRE in this capture range was 3.3 ± 1.2 mm, while the standard deviation of mTRE in Bom's work was 1.6 mm. Thus, the frequency estimation using wavelet pyramid is more robust.

Frequency estimation using golden standard data set.
5.3. Test of Complete Initialization
In this experiment, the combination of tracking-based initialization and multiresolution frequency estimation was evaluated using the spinal phantom. The pose of phantom was tracked by Vicon motion tracking system and changed with the rotation in the range of [− 20°, 20°] and translation offsets from −10 to 10 mm along three axes, respectively. Two X-ray images were acquired from the AP view and later view, respectively. The ground truth of transformation parameters was determined by 2D/3D intensity-based registration with elastix toolbox [35]. The mask images were used to match the circle shape of the actual X-ray images and reduce the influence of sharp edges.
Figure 11 shows the results using the complete initialization. mTREstart was located in the range from 2.12 to 22.57 mm provided by the tracking-based initialization. This uncertainty was mainly caused by the inaccurate measurements of markers’ positions in the CT data and placement of spinal phantom when estimating the transformation CTTVicon. The method of complete initialization provided the improvement for all experiments. In 85.95% of cases, the mTRE after estimation was less than 7.5 mm. And the mean and standard deviation of mTREestimate for the successful estimation were 2.38 mm and 2.16 mm, respectively.

Results of the complete initialization test.
5.4. Test of Intensity-Based Registration Using Complete Initialization
2D/3D intensity-based registration using the complete initialization was tested using the spinal phantom. There were three sets of experiments performed in this section. The traditional registration (IBR) was performed using the normalized gradient correlation as the similarity measure and optimization was implemented by Powell's method. The ranges of rotation and translation in Section 5.3 were applied to provide the initial parameters in this experiment. The registration following frequency estimation (F-IBR) and complete initialization including Level I and Level II (TF-IBR) were also implemented, respectively.
Figure 12 shows the mTREfinal of IBR (blue triangles), F-IBR (green triangles), and TR-IBR (red circles). The red line (mTREfinal = 2 mm) indicates the successful registration. Only 1 to 2 iterations were required using TR-IBR, while the registration using IBR took 30 iterations on average with the maximum of 75. And the successful rate of TF-IBR was improved greatly from 72.8% to 84.8%, when tracking-based initialization was used. The average mTREfinal was 2.36 mm, while 3.1 mm was reported on mTREfinal in Bom's work.

Results of 2D/3D intensity-based registration.
6. Conclusion
Intensity-based 2D/3D registration is a critical problem to register the preoperative volume to the patient setup in the image guided surgery. Currently the registration is dominated by DRR-based method with a high accuracy. However, the small capture range limits its use in the clinical application. In this paper, a two-level scheme was proposed to address such problem by a hardware-software integrated approach. In the first level, automatic tracking-based initialization based on Vicon motion tracking system estimated the relative pose of X-ray images and generated coarse transformation parameters for the next level. A frequency estimation using multiresolution strategy provides almost optimal transformation parameters for intensity-based 2D/3D registration. A public golden standard data set and a spinal phantom were used in experiments. The capture range using frequency estimation only was enlarged to 20.1 mm and the mTRE in this capture range was 2.92 ± 2.21 mm. Two-level estimation has the successful rate of 84.8% with the average mTREfinal of 2.36 mm.
Compared with the results in Bom's work, the performances of initialization for intensity-based registration were improved, including the mTRE, capture range, successful rate, and running time. Especially for the running time, the multiresolution strategy speeded up the frequency estimation by 65.1 s. Although it was not improved greatly, more robust implementation was achieved. Furthermore, the implementation of proposed initialization method can be accelerated by the parallel computing based on GPU [23, 36].
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Footnotes
Acknowledgments
This work was supported by Scientific Research Base Project of Beijing Municipal Commission of Education (TJSHG201310028014). The authors also gratefully acknowledge the contribution of Yang Li and Guoli Song from Shenyang Institute of Automation, Chinese Academy of Sciences, for the discussion and suggestion on the experiment.
