Sage Journals: Discover world-class research

Abstract

This article proposes quantitative analysis tools and digital phantoms to quantify intrinsic errors of deformable image registration (DIR) systems and establish quality assurance (QA) procedures for clinical use of DIR systems utilizing local and global error analysis methods with clinically realistic digital image phantoms. Landmark-based image registration verifications are suitable only for images with significant feature points. To address this shortfall, we adapted a deformation vector field (DVF) comparison approach with new analysis techniques to quantify the results. Digital image phantoms are derived from data sets of actual patient images (a reference image set, R, a test image set, T). Image sets from the same patient taken at different times are registered with deformable methods producing a reference DVF_ref. Applying DVF_ref to the original reference image deforms T into a new image R′. The data set, R′, T, and DVF_ref, is from a realistic truth set and therefore can be used to analyze any DIR system and expose intrinsic errors by comparing DVF_ref and DVF_test. For quantitative error analysis, calculating and delineating differences between DVFs, 2 methods were used, (1) a local error analysis tool that displays deformation error magnitudes with color mapping on each image slice and (2) a global error analysis tool that calculates a deformation error histogram, which describes a cumulative probability function of errors for each anatomical structure. Three digital image phantoms were generated from three patients with a head and neck, a lung and a liver cancer. The DIR QA was evaluated using the case with head and neck.

Keywords

deformable image registration quality assurance

Introduction

Several commercial software packages provide deformable image registration (DIR) tools to enhance target delineation in the era of intensity-modulated radiation therapy, image-guided radiation therapy, and image-guided adaptive radiation therapy.^1–3 Moreover, researchers continue to develop new DIR algorithms—diffeomorphic demons,^4–9 diffeomorphic morphons,¹⁰ optical flow,^11,12 finite element model (FEM),^13–15 small deformation inverse consistent linear elastic,¹⁶ thin plate spline, free form deformation,^17–21 and Markov random field²²—reflecting the growing interest in deformable contours for adaptive radiation therapy and composite dose visualization of multiple treatment plans.

In order to introduce these DIR tools into clinical practice, established quality assurance (QA) and acceptance test procedures are essential. Although there have been many research efforts^15,23–36 to devise a quantitative evaluation method for DIR, robust QA and acceptance test procedures are still lacking. In this article, we implemented a localized error analysis tool proposed by Wang et al,³⁵ which displays a color-keyed map of deformation registration errors on each image slice. In addition, a global error analysis tool is presented that calculates deformation errors per anatomical structure. This new tool is called a deformation error histogram (DEH). These tools are useful for quantifying intrinsic errors of DIR systems, but a truth set consisting of a reference image set, a test image set, and a deformation vector field (DVF) is needed as a benchmark. Previous efforts have not translated into realistic clinical use. In this regard, a novel truth set reconstruction method is proposed. A truth set created from the proposed method is called a “digital image phantom” and consists of a reference image set, a test image set, and a DVF. A “phantom” refers to a known object (or set of files—thus “digital”) which is used to benchmark a system. In this application, the digital phantom is created from actual patients for QA of multiple anatomical sites so that a generated truth set may be used in clinics and made available to the community for analyzing DIR errors.

Overview of DIR

Deformable image registration is a process to find the best-estimated DVF, which forms the voxel correspondence between 2 different image sets. In other words, DIR finds a matrix that represents how individual voxels of 1 image are “deformed” (moved, etc), so they optimally line up with corresponding voxels from another image. Figure 1 shows a schematic of the DIR process. For a given spatial transformation (a DVF), the interpolator applies the transformation to a test image and compares the transformed test image with the reference image. The metric evaluates the degree of similarity between the reference image and the transformed test image. The optimizer can now estimate the best candidate for the DVF. A newly estimated DVF is used by the interpolator for the next iteration. Iteration continues until the metric of similarity satisfies the given criteria or threshold. The performance and accuracy of the DIR depend on the configuration settings of each component, similarity measure, interpolator, and optimizer.

Figure 1.

Diagram of deformable image registration process.

There are multiple DIR algorithms available such as B-spline, demon, and FEM registration. Although the different methods utilize the same general iterative process described in Figure 1, they can arrive at different results, given the distinct similarity measures such as mutual information (MI)³⁷ or statistical approaches.^38–40 Additionally, the algorithms’ decision on when an image is successfully deformed to another image is central to how the algorithm arrives at the final iteration. Furthermore, different methods for image interpolation after the transformation (linear interpolation, cubic spline interpolation, and sinc interpolation) result in different end points. The selection of the interpolation method can affect the calculation time and accuracy of the registration.^37,41 Regular step gradient descent, stochastic gradient descent (SGD), and limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) are some of the available choices for optimizers. Regular step gradient descent is a generally well-known optimizer.⁴² The L-BFGS can find a good registration for most DIRs but consumes a lot of calculation time.⁴² Stochastic gradient descent can find a comparable result in reduced calculation time.⁴³

Components outside the core DIR process can also affect the characteristics and/or effectiveness of a registration. Image modality, allowed duration of the optimization process (iteration time), tissue type focus (soft tissue vs. bone, for example), and registration’s purpose (image registration vs. transformed contour) can affect the results. Moreover, most DIR implementations utilize randomly sampled pixels to reduce calculation time. An accurate evaluation of systematic and random errors in a DIR system is essential before utilizing a DIR package for clinical applications.

Previous Approaches to DIR QA

Image subtraction approach

The traditional method of DIR system evaluation involves a paired image set composed of a reference image, R, and that same reference image deformed using an artificial DVF (DVF_artifiical) to produce the test image, T _artificial.³⁵ To test the DIR, T _artificial is registered to R. The DVF created (DVF_test) should match DVF_artificial. There are 2 analysis methods that can be used to quantify the accuracy of the registration. The image subtraction method compares the intensity difference between the reference image and the registered test image after DIR. A shortfall of using image difference analysis is measuring registration errors in regions where the neighboring pixels have the same intensity values. It is possible that 2 voxels may have the same value but should not be aligned to each other. A difference image would not show differences in the region, although the voxels are not registered correctly by location.

Artificial DVF comparison

Artificial DVF comparison is the second approach,^34,44 which compares the given DVF_artificial and the new DVF_test. Although a DVF comparison can overcome the shortfall of image subtraction analysis, both traditional approaches utilize an artificial DVF. However, comparing artificial DVFs may be unrealistic if it is assumed to apply to images of the full range of human anatomy. The unique characteristics of deformations at different anatomical sites may be unsuitable for a single DIR setting. For example, head and neck, lung, and liver anatomies may deform in different ways, demanding unique solutions and QA. Although some DIR software packages do provide anatomic-specific parameters, optimal settings are unknown since quantifiable QA of the results are missing.

Landmark-based approach

Previous efforts^{24,25,28,29,45} to evaluate image registration errors utilize patient landmarks that appear within the image and quantifying their deformation and registration. Researchers typically employ visual checks that compare manually designated landmarks such as point of interests or organ contours between 2 image sets. Brock and Consortium²⁴ and Castillo et al.²⁵ developed a software tool to generate large landmark point sets automatically. They produced a large number (>1000) of matching landmark point sets for lung image sets. Similarly, Brock and Consortium²⁴ utilized manually chosen landmark point sets to compare DIR results among multiple institutions.

In these landmark-based approaches, 2 different image sets are utilized from the same patient. An expert physician finds matching landmarks between 2 image sets; after DIR, the landmarks on the test image set should be matched with the landmarks corresponding to the reference image set. By measuring the distance between the reference landmark point and the registered landmark point on the test image set, the error magnitude of the image registration system can be measured as maximum and an average distance to agreement. These approaches quantify the correlation between the computerized image registration and human visual judgment.

Vaman et al.³³ devised a way to reduce the number of landmarks since entering a large number of landmark points is clinically tedious and impractical. To quantify the errors in a DIR for 4-dimensional computed tomography (CT) image sets, they applied principal component analysis (PCA) to landmark patient motion due to respiration. The PCA can estimate the fundamental eigenmodes of human respiration and difference between the eigenmodes from the landmarks and the eigenmodes from the deformation vectors were measured. They showed their efficacy by comparing eigenmodes from randomly selected subsets of landmarks. They also found that validation through a selected small number of landmarks can lead to unrepresentative results.

Murphy et al.³⁰ recently utilized a similar scheme using a small number of landmarks to estimate the uncertainty in daily dose mapping due to DIR error. Although their efforts made significant progress, these landmark-based approaches have significant limitations. Visual verification of landmarks can only be performed where significant image features exist (ie, at limited sites). As well, intrinsic uncertainties of DIR in regions with no image features cannot be measured with these techniques. Figure 2 illustrates the shortfall of landmark-based approaches. The left graphic is an initial planning CT image, and the right graphic is a follow-up CT image. The clinical target volume (CTV) contour is shown in red and 2 corresponding points are marked on both images. The points A and A′ have significant image features, so we can confirm their correspondence visually. However, the points B and B′ have no significant image features near the point, and therefore, their correlation may be incorrect. Without a method to evaluate the deformation magnitude of error in regions of similar image features, quantitative analysis of deformation accuracy remains limited.

Figure 2.

An illustration demonstrating the shortfall of landmark-based approaches. The points A and A′ have significant image features, so we can visually confirm correspondence. However, the points B and B′ have no significant image features and lack methods to evaluate the robustness of the deformation.

Unbalanced energy approach

Zhong et al.³⁶ utilized unbalanced energy analysis that compares the DVF between DIRs using FEM and B-spline registrations and demon registrations. The group found deformation vectors calculated by the FEM and the B-spline methods showed a 2-mm average difference near organ edges. This result is in accordance with previous landmark-based approaches. However, in regions with no significant gradient features, deformation vectors from various DIR methods demonstrated much larger differences up to 10 mm.³⁶ Although this method provides the DVF comparison between DIRs, 2 numerical phantoms were used, in which one was fabricated using artificial bladder, prostate, rectum, and femoral head structures and another one was created from a patient with lung cancer using a known DVF. This indicates that the utilized numerical phantoms do not reflect a realistic environment in a clinic.

Other approaches

Physical deformable phantoms were proposed to validate the accuracy of DIR.^46–48 These efforts have a strong advantage in exact matching of voxels after performing DIR. However, it is very difficult to mimic all anatomical structures in a clinical environment and not realistic. Varadhan et al³⁴ proposed a framework for DIR validation using ImSimQA (Oncology Systems Limited, Shrewsbury, Shropshire, United Kingdom) and 3DSlicer (Open Source Software Package, http://www.slicer.org) tools. Two image sets as a validation data of DIR were created with a deformation using ImSimQA. After performing DIR between the created 2 image sets, 2 deformation fields, anatomical correspondence, and image quality were analyzed using 3DSlicer. Although the validation scheme of this research is reasonable, this technique also used artificial image sets, not clinical.

New proposed approach

As an alternative solution to DIR QA described in the previous sections, we propose a new DIR QA procedure for practical clinical use. This work was partially introduced in our previous research.⁴⁹ This approach was also proved as an indicator that can show DIR accuracy by using patients with liver and lung.⁵⁰ Pukala et al used the similar concept of the digital image phantom for kVCT volumetric image sets of head and neck.⁵¹ Nie et al utilized the same histogram to quantify deformation errors of DIR systems, but they did not consider the error histogram for each anatomical structure.⁴⁴ Digital image phantoms generated from deidentified clinical cases (which consists of a reference image set (R′), a test image set (T), and a reference DVF for various anatomical sites) are made available to clinics via the Web as downloadable content. Further, a set of QA tools composed of local and global error analysis systems analyze clinically registered images. Local error analyses display deformation errors via color maps on each image slice and the global error analysis tool can review error per voxel and/or structure of interest (SOI) using a DEH. Tools and digital image phantoms are utilized locally to evaluate individual clinical systems.

Materials and Methods

Theoretical Process of Generating a Truth Image Set

Our proposed QA process requires 2 image sets and a true DVF_ref. We start with 2 image sets (image set R and T) for the same patient taken at different times; the initial image set R and the later image set T are set as a reference image set and a test image set, respectively. Figure 3 shows the processes to generate a digital image phantom set. At completion of the DIR, the algorithm finds a DVF, which is a map of deformation vectors from pixels or grids in the test image to those in the reference image, assigning the DVF as a DVF_ref and convolving the test image T with DVF_ref to generate the image R′. Therefore, the DVF_ref between the image set R′ and the image set T becomes the “true” deformation. In summary, we applied a realistic DVF to an image set to generate a deformed image set for creating QA image data instead of using an artificial DVF.

Figure 3

A, Process to generate a truth data with R′, T, and DVF_ref. B, Process to evaluate deformation errors using the truth data. The deformation vectors field (DVF_test) should be equal to the DVF_ref when there is no error in the deformable image registration (DIR) system. Intrinsic errors are measured by calculating vector differences between DVF_ref and DVF_test

When performing DIR between the image sets of R′ and T, the DIR should generate a DVF identical with the truth deformation DVF_ref if the DIR system has no intrinsic errors as shown in Figure 3B. The deformation errors are composed of random errors and systematic errors. The random errors are from noise in the image sets and sampling processes in the DIR systems. The systematic errors are from limitations of optimizers or the characteristics of the similarity metrics. By comparing DVF_ref and DVF_test using the local and global error analysis tools, we can characterize the error in DIR system.

Theory for Quantitative Error Analysis Tools

Local error analysis tool: Visualization of errors on each image slice with color mapping

In order to compare DVF_ref and DVF_test, the vector difference was taken at each voxel with the same coordinate. There is a vector ${\vec{R V}}_{(x, y, z)} = (R_{x}, R_{y}, R_{z})$ at a voxel coordinate (x, y, z) in the DVF_ref and a vector ${\vec{T V}}_{(x, y, z)} = (T_{x}, T_{y}, T_{z})$ at voxel coordinate (x, y, z) in the DVF_test. The vector difference between the 2 vectors is ${\vec{R V}}_{(x, y, z)} - {\vec{T V}}_{(x, y, z)}$ . This resulting vector difference is assigned to the same coordinate. In this article, a colored map is used to visualize the magnitude of vector differences from the deformation errors. Equation 1 calculates the difference between the vectors, and Equation 2 calculates the magnitude of the vector difference found in Equation 1.

\begin{aligned} {\vec{R V}}_{(x, y, z)} - {\vec{T V}}_{(x, y, z)} \end{aligned} = {\vec{V D}}_{(R x, R y, R z)} = (R_{x} - T_{x}, R_{y} - T_{y}, R_{z} - T_{z})

\begin{aligned} |{\vec{R V}}_{(x, y, z)} - {\vec{T V}}_{(x, y, z)}| \end{aligned} = \sqrt{(R_{x} - T_{x})^{2} + (R_{y} - T_{y})^{2} + (R_{z} - T_{z})^{2}}

Global error analysis tool: DEH

For a quantitative analysis of global deformation errors, a DEH was developed from vector differences between the test DVF_test and the reference DVF_ref. The DEH produces a quantification of the deformation errors per anatomical structure that can be graphically displayed. The DEH graph indicates a cumulative distribution of errors per organ or SOI. The DEH concept utilizes an approach similar to the dose–volume histogram.⁵² A frequency analysis was applied to vector differences d ₁, d ₂, …, d_N measured at voxels with the same coordinate between the DVF_ref and the DVF_test. When a reference value D_r is a target histogram bin, the cumulative frequency $M_{D_{r}}$ of D_r is the frequency in which the measured distances are greater than or equal to D_r as shown in Equation3.

M_{D_{r}} = n u m b e r {k |D_{k} \geq D_{r}} = \sum_{k}^{N} I (D_{k} \geq D_{r}),

where N is the number of vectors, and I is the characteristic function that indicates the value 1 if $D_{k}$ is greater than or equal to D_r or the value 0 otherwise. The representation of a histogram using the cumulative frequency is difficult to understand as the distribution of vector differences is not normalized. For simplification, the relative probability of the cumulative frequency is used as shown in Equation 4.The relative probability ranges from 0% to 100% and can be estimated for each anatomical site. By using Equation 4, a DEH histogram analysis generates a meaningful distribution for each SOI that is measured.

P_{c} (D_{r}) = \frac{M_{D_{r}}}{N} \times 100 (%)

Case Study: Head and Neck DIR

The proposed QA approach was applied to our in-house DIR system using a case with head and neck. To provide the image sets R′ and T for the test registration, we exported the image sets in DICOM format. The in-house DIR system utilizes a B-spline image registration algorithm. In Figure 4, the testing process is summarized in a schematic diagram. After 2 image sets (a reference and a test) are imported, the particular DIR software performs the registration process as explained earlier. As a result of the registration, a DVF_test is produced. This DVF_test is compared to the DVF_ref, which is the truth DVF between the R′ and T image sets. The vector difference between the test DVF_test and the ref DVF_ref will be utilized to quantify deformation errors in the DIR. In cases where the test DVF has grid resolution different from the DVF_ref, it should be interpolated to match to the DVF_ref grid resolution.

Figure 4.

Quality assurance (QA) evaluation of a DIR system. The user can select among various anatomical image sets to simulate the clinical situation. After running the deformable image registration using selected image sets, a deformation vector field (DVF) is exported and compared to the truth DVF.

Generating a head and neck QA image data set

This section describes the generation of a head and neck digital QA phantom using the in-house DIR software.⁵³ A patient with head and neck cancer with 2 radiation therapy planning CT data sets in a single month is used to create the truth set. After the initial CT simulation, the patient had a tracheotomy and required a new CT scan and immobilization mask. In addition to the 2 data sets, manual contours were also delineated by board-certified radiation oncologists. The first treatment planning image set was selected as reference image set R. The second treatment planning image set was selected as test image set T. The reference image set was 133 image slices of 512 × 512 pixels. The pixel resolution was 1.17 × 1.17 mm and the slice thickness was 3 mm. The test image set was 131 image slices of 512 × 512 pixels and the slice thickness and pixel resolution are identical to the reference image set.

In order to generate the initial DVF_ref (see Figure 4), a DIR was performed using our in-house DIR software. B-spline DIR using MI metric was applied instead of using the mean square error (MSE) metric. The MSE would produce a large error due to the endotracheal tube only being present in the test image set T. L-BFGS was utilized to optimize the number of random sampling points comparing pixels between the 2 image sets. The sinc interpolator was utilized to generate the final R′ image set from the test image set T using the DVF_ref. The sinc interpolator⁵⁴ is slow but provides the least image distortion compared to other methods.

Test of deformable registration systems using the head and neck digital phantom

We applied our test procedure to evaluate DIR system errors from the head and neck image registrations using our in-house DIR software. The configuration of our in-house DIR system is as follows: B-spline transform, linear interpolator, SGD optimizer, and MI similarity metric. A thousand pixels were randomly sampled per iteration for similarity calculation, and the maximum iteration was set to 700. We utilized a multi-resolution approach⁵⁵ where the 512 × 512 × 133 and 512 × 512 × 131 image resolutions were initially downsampled to 128 × 128 × 34 and 128 × 128 × 33. The second pass resolutions were raised to 256 × 256 × 67 and 256 × 256 × 66, respectively. The B-spline grid size was set to 21 mm. The number of histogram bins for the MI calculation was 50.

Results

Creation of a Head and Neck Digital Image Phantom

The DIR between an original reference image and a test image was performed randomly utilizing 50% of the pixels from the reference image. The DIR’s execution using L-BFGS optimizer took 3 hours using an AMD Opteron 6136 (2.4 GHz) processor (Advanced Micro Devices, Semiconductor Company, Sunnyvale, CA, United States). Visual checks such as the “checkerboard visualization” were performed to verify the DIR was accurate and acceptable. Once it was approved, a DVF_ref was generated from the DIR result. After that, we created a new reference image set R′ transformed from the test image T using the DVF_ref. Figure 5 illustrates that DVF_ref generates a clinically realistic deformation by comparing R′ to R without erroneous image distortion. Figure 5A visualizes the alignment using the checkerboard test. Figure 5B shows the intensity difference following image subtraction. R′ is similar to R, showing small differences near the high-gradient edges of bone, skin, and anatomical structures.

Figure 5.

A, The rigid registration and the intensity difference between R′ and R image sets to show realistic DVF_ref. The R′ transformed from the image set T does not have erroneous distortion compared to the original R. Sagittal image highlights an endotracheal tube only present in image T. B, In the intensity difference, the gray scale means the difference value. The intensity difference along edge lines of bone, skin, and anatomical structures is higher than others.

The DIR between the image set R′ and the image set T produced DVF_test. This registration was assessed by comparing DVF_ref and DVF_test. By repeating the steps mentioned earlier, we created 3 truth digital image phantoms, head and neck, liver, and lung. Since the DIR QA of all the test cases follows the same process, we only present the experimental results of the case with head and neck.

Deformable Image Registration System Test Results

Local deformation error analyses

Figure 6 shows the results of DIR using the in-house system as applied to the digital phantom. Figure 6A illustrates visual registration checks showing reasonable matching between the reference image R′ (gray image) and the transformed test image (green image). Figure 6B illustrates the 2˜3 mm deformation errors around the skull surface, jaw, and posterior neck using the in-house DIR software. Vector differences are calculated over a 2-mm grid resolution. The largest deformation error was found at shoulders.

Figure 6.

The visualization of the magnitude of deformation errors for the head-and-neck case using the in-house deformable image registration (DIR) system. A, Gray and green images are reference images of the reference image set R′ and a transformed test image, respectively. B, Local error analysis performed by DVF comparison showing significant errors at various locations although the visual check was satisfactory. The magnitude of vector differences between the DVF_ref and the DVF_test was calculated with a 2-mm grid resolution and displayed as a color map on the image set R′.

Global deformation error analyses

In Figure 7, to analyze the global deformation errors, we generated DEH for the primary CTV, brain stem, shoulders, and normal tissues. It is important to note that this histogram is generated from the registered DVFs and not the image differences. The DEH demonstrates the confidence range of deformation errors per the selected SOI. The DEH for the primary CTV shows that 95% of deformation errors were less than 0.72 mm. Those for the rest of SOIs (shoulders, spinal cord, and brain stem) were less than 3.32, 1.25, and 1.87 mm, respectively.

Figure 7.

Deformation error histogram (DEH) for the head and neck example. The cumulative histogram of deformation errors visually shows the confidence range of errors.

Deformation errors are also analyzed using conventional statistical methods, taking the average and standard deviation of the errors. We summarized the statistical as well as the DEH analyses for the selected SOIs in Table 1. The average error for the partial brain in Table 1 is 1.23 mm and the standard deviation is 0.46 mm. Therefore, the 2σ range is up to 2.15 mm. However, the measured 95% confidence range from DEH is 1.97 mm. The analysis using the average and the standard deviation may not accurately convey the magnitude of deformation errors as shown in Table 1. In addition, the DEH graph shows the confidence range of the error in DVF for each organ.

Table 1.

After Performing DIR Using the In-House DIR system, the Confidence Ranges of Deformation Errors Were Calculated Using Traditional Statistical and DEH Analyses for the Case With Head and Neck.^a

Structure of Interest	Traditional Analysis		Deformation Error Histogram Analysis
Structure of Interest	Average	Standard Deviation (σ)	Confidence Range (95%)	Confidence Range (68%)
CTV (Primary)	0.38	.18	0.72	0.45
CTV (Lymph node)	0.85	.41	1.60	1.03
Left parotid	0.93	.45	1.74	1.16
Mandible	1.17	.46	2.05	1.33
Esophagus	0.71	.28	1.12	0.89
Oral structure	1.27	.89	2.92	1.70
Partial brain	1.23	.46	1.97	1.49
Posterior neck	1.10	.64	2.36	1.24
Brain stem	1.38	.31	1.87	1.56
Spinal cord	0.79	.27	1.25	0.90
Normal tissue	0.72	.37	1.50	0.84
Shoulders	1.60	.86	3.32	1.83

Abbreviations: CTV, clinical target volume; DEH, deformation error histogram; DIR, deformable image registration.

^a All ranges are in millimeter (mm) scale.

Discussions

These digital image phantoms and quantitative tools can be used to measure local and global magnitudes of errors during commissioning of a DIR system for clinical applications. An analogy can be found in the use of gamma analysis evaluating performances of clinical dose delivery systems in place of simple dose differences. In the same manner, the use of digital phantoms and DEH for evaluating DIR systems can be used to QA systems for clinical procedures. Furthermore, if a specific DIR system allows a user to select a set of parameters, then our process can be used to identify the most optimized parameter set for a specific anatomical site which would produce minimal errors. Composite doses constructed from DIRs are routinely used in making medical treatment decisions. For accessing the radiation toxicity in particular organs at risk, we need to generate an accurate composite dose that requires DIR. Furthermore, the success and failures of these registrations can be delineated by locality. Park et al.⁵⁶ investigated a fuzzy composite dose representation to deal with the uncertainty in DIR. It can generate composite dose plans displaying locality-based uncertainties. By utilizing our proposed test procedure along with the fuzzy composite dose representation, we will be able to collect the data for modeling the deformation vector errors for a specific SOI, which reflects the DIR uncertainty in the composite dose at specific anatomical locations.

To utilize our proposed QA procedure, the user should retrieve the deformation vector data from their image registration systems. Most in-house systems are able to export their deformation vector data since the users have the program source codes. However, many commercial systems utilize their proprietary data formats to store the deformation vectors, although DICOM RT format is recommended for DVF. For example, only the newer version of MIM Maestro (MIMsoftware, Clevaland, Ohio) supports the DICOM format to store the deformation vectors. Otherwise, the commercial system users can retrieve the deformation vector data if the vendor provides adequate technical support. In addition, most DIR systems have adjustable parameters to optimize the DIR algorithm, which may affect quality and performance. The optimization process for finding a set of parameters from a specific anatomical site may be required. In this research, we presented multiple digital image phantoms using only CT image sets. Further works are needed for DIR between different diagnostic modalities such as magnetic resonance and CT or ultrasound and CT. Many clinics are utilizing positron emission tomography (PET)/CT or diagnostic PET or Single-photon emission computed tomography images to delineate gross tumor volume (GTV) and/or CTV. Based on the preliminary results of the QA test proposed in this research, adding a margin for DIR uncertainty may be necessary when a GTV and/or CTV is targeted on deformed data sets.

Conclusions

In this research, we implemented multiple digital image phantoms, based on real patients, and a local and a global error analysis tool for QA of DIR systems. We successfully built a DVF comparison software tool and downloadable digital image phantoms for the DIR QA procedures. The digital image phantom consists of a reference image set, a test image set, and a truth DVF created through the DIR between 2 image sets of real patients (for clinical relevance). The local error analysis tool displays the magnitude of deformation errors on each 2D image slice and the global error analysis tool generates a deformation confidence range per anatomical site in a histogram. The DEH proved to be a useful analysis tool and should be included for future QA commissioning of DIR systems.

Three digital image phantoms (head and neck, lung, and liver) consisting of a reference image set, a test image set, and a deformation vector map field are made available for public access through the Web link at http://rophys.case.edu/dip/. The DEH analysis software is also available. By utilizing the proposed QA procedure, an in-house plan review system was proved to have acceptable error range of deformation vectors.

Footnotes

Abbreviations

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

References

Kessler

. Image registration and data fusion in radiation therapy. Br J Radiol. 2006;79 (spec no 1):S99–S108. doi:10.1259/Bjr/70617164

Yan

. Developing quality assurance processes for image-guided adaptive radiation therapy. Int J Radiat Oncol Biol Phys. 2008;71 (1 suppl):S28–S32. doi:10.1016/j.ijrobp.2007.08.082

Sarrut

. Deformable registration for image-guided radiation therapy. Z Med Phys. 2006;16 (4):285–297.

Bricault

Ferretti

Cinquin

. Registration of real and CT-derived virtual bronchoscopic images to assist transbronchial biopsy. IEEE Trans Med Imaging. 1998;17 (5):703–714. doi:10.1109/42.736022

Guimond

Roche

Ayache

Meunier

. Three-dimensional multimodal brain warping using the demons algorithm and adaptive intensity corrections. IEEE Trans Med Imaging. 2001;20 (1):58–69. doi:10.1109/42.906425

Pennec

Cachier

Ayache

Understanding the “Demon’s algorithm”: 3D non-rigid registration by gradient descent. In: Medical Image Computing and Computer-Assisted Intervention (MICCAI); 1999 September 19-22;1679:597-605; Cambridge, UK. doi:10.1007/10704282_64

Thirion

. Image matching as a diffusion process: an analogy with Maxwell’s demons. Med Image Anal. 1998;2 (3):243–260. doi:10.1016/S1361-8415(98)80022-4

Vercauteren

Pennec

Perchant

Ayache

. Diffeomorphic demons: Efficient non-parametric image registration. Neuroimage. 2009;45 (1 suppl):S61–S72. doi:10.1016/j.neuroimage.2008.10.040

Vereauteren

Pennec

Perchant

Ayache

. Non-parametric diffeomorphic image registration with the demons algorithm. Med Image Comput Comput Assist Interv. 2007;10 (pt 2):319–326.

10.

Wrangsjo

Pettersson

Knutsson

Non-rigid registration using morphons. Image Analysis Lecture Note in Computer Science. 2005;3540:501–510. doi:10.1007/11499145_51

11.

Zhang

Huang

Guerrero

. Use of three-dimensional (3D) optical flow method in mapping 3D anatomic structure and tumor contours across four-dimensional computed tomography data. J Appl Clin Med Phys. 2008;9 (1):2738. doi:10.1120/jacmp.v9i1.2738

12.

Zhang

Huang

Forster

. Dose mapping: validation in 4D dosimetry with measurements and application in radiotherapy follow-up evaluation. Comput Methods Programs Biomed. 2008;90 (1):25–37. doi:10.1016/j.cmpb.2007.11.015

13.

Ferrant

Nabavi

Macq

Jolesz

Kikinis

Warfield

. Registration of 3-D intraoperative MR images of the brain using a finite-element biomechanical model. IEEE Trans Med Imaging. 2001;20 (12):1384–1397. doi:10.1109/42.974933

14.

Xuan

Wang

Freedman

Adali

Shields

. Nonrigid medical image registration by finite-element deformable sheet-curve models. Int J Biomed Imaging. 2006;2006:73430. doi:10.1155/IJBI/2006/73430

15.

Zhong

Peters

Siebers

. FEM-based evaluation of deformable image registration for radiation therapy. Phys Med Biol. 2007;52 (16):4721–4738. doi:10.1088/0031-9155/52/16/001

16.

Christensen

Song

El Naqa

Low

. Tracking lung tissue motion and expansion/compression with inverse consistent image registration and spirometry. Med Phys. 2007;34 (6):2155–2163. doi:10.1118/1.2731029

17.

Abolhassani

Samani

. Non-rigid registration using free-form deformation for prostate images. In: Annual Meeting of the North American Fuzzy Information Processing Society (NAFIPS); 2005 June 26-28;51-54; Detroit, United States. doi:10.1109/Nafips.2005.1548506

18.

Jacobson

Murphy

. Optimized knot placement for B-splines in deformable image registration. Med Phys. 2011;38 (8):4579–4582. doi:10.1118/1.3609416

19.

Kybic

Unser

. Fast parametric elastic image registration. IEEE Trans Image Process. 2003;12 (11):1427–1442. doi:10.1109/Tip.2003.813139

20.

Loeckx

Maes

Vandermeulen

Suetens

. Nonrigid image registration using free-form deformations with a local rigidity constraint. In: Medical Image Computing and Computer-Assisted Intervention (MICCAI); 2004 September 26-29;3216:639-646; Saint-Mal, France. doi:10.1007/978-3-540-30135-6_78

21.

Rueckert

Sonoda

Hayes

Hill

DLG

Leach

Hawkes

. Nonrigid registration using free-form deformations: Application to breast MR images. IEEE Trans Med Imaging. 1999;18 (8):712–721. doi:10.1109/42.796284

22.

Glocker

Sotiras

Komodakis

Paragios

. Deformable medical image registration: setting the state of the art with discrete methods. Annu Rev Biomed Eng. 2011;13:219–244. doi:10.1146/annurev-bioeng-071910-124649

23.

Boldea

Sharp

Jiang

, Sarrut D. 4D-CT lung motion estimation with deformable registration: quantification of motion nonlinearity and hysteresis. Med Phys. 2008;35 (3):1008–1018. doi:10.1118/1.2839103

24.

Brock

Consortium

DRA

. Results of a multi-institution deformable registration accuracy study (Midras). Int J Radiat Oncol. 2010;76 (2):583–596. doi:10.1016/j.ijrobp.2009.06.031

25.

Castillo

Guerra

. A framework for evaluation of deformable image registration spatial accuracy using large landmark point sets. Phys Med Biol. 2009;54 (7):1849–1870. doi:10.1088/0031-9155/54/7/001

26.

Dalah

Nisbet

Reise

Bradley

. Evaluating commercial image registration packages for radiotherapy treatment planning. Appl Radiat Isotopes. 2008;66 (12):1948–1953. doi:10.1016/j.apradiso.2008.06.003

27.

Pan

Liang

. Implementation and evaluation of various demons deformable image registration algorithms on a GPU. Phys Med Biol. 2010;55 (1):207–219. doi:10.1088/0031-9155/55/1/012

28.

Latifi

Zhang

Stawicki

van Elmpt

Dekker

Forster

. Validation of three deformable image registration algorithms for the thorax. J Appl Clin Med Phys. 2013;14 (1):3834. doi:10.1120/jacmp.v14i1.3834

29.

Loi

Dominietto

Manfredda

. Acceptance test of a commercially available software for automatic image registration of computed tomography (CT), magnetic resonance imaging (MRI) and (99 m)Tc-methoxyisobutylisonitrile (MIBI) single-photon emission computed tomography (SPECT) brain images. J Digit Imaging. 2008;21 (3):329–337. doi:10.1007/s10278-007-9042-7

30.

Murphy

Salguero

Siebers

Staub

Vaman

. A method to estimate the effect of deformable image registration uncertainties on daily dose mapping. Med Phys. 2012;39 (2):573–580. doi:10.1118/1.3673772

31.

Shen

Matuszewski

Shark

Skalski

Zielinski

Moore

CJ.

Deformable Image Registration - A Critical Evaluation: Demons, B-Spline FFD and Spring Mass System. In: Fifth International Conference BioMedical in Medical and Biomedical Informatics; 2008 July 9–11;77–82; London, UK. doi:10.1109/MediVis.2008.11

32.

Skerl

Likar

Pernus

. A protocol for evaluation of similarity measures for non-rigid registration. Med Image Anal. 2008;12 (1):42–54. doi:10.1016/j.media.2007.06.001

33.

Vaman

Staub

Williamson

Murphy

. A method to map errors in the deformable registration of 4DCT images. Med Phys. 2010;37 (11):5765–5776. doi:10.1118/1.3488983

34.

Varadhan

Karangelis

Krishnan

Hui

. A framework for deformable image registration validation in radiotherapy clinical applications. J Appl Clin Med Phys. 2013;14(1):4066. doi:10.1120/jacmp.v14i1.4066

35.

Wang

Dong

O’Daniel

. Validation of an accelerated “demons” algorithm for deformable image registration in radiation therapy. Phys Med Biol. 2005;50 (12):2887–905. doi:10.1088/0031-9155/50/12/011

36.

Zhong

Kim

Chetty

. Analysis of deformable image registration accuracy using computational modeling. Med Phys. 2010;37 (3):970–979. doi:10.1118/1.3302141

37.

Pluim

JPW

Maintz

JBA

Viergever

. Mutual-information-based registration of medical images: a survey. IEEE Trans Med Imaging. 2003;22 (8):986–1004. doi:10.1109/Tmi.2003.815867

38.

Pluim

JPW

Maintz

JBA

, Viergever MA. f-information measures in medical image registration. IEEE Trans Med Imaging. 2004;23 (12):1508–1516. doi:10.1109/Tmi.2004.836872

39.

Chiang

Dutton

Hayashi

. Fluid registration of medical images using Jensen-Renyi Divergence reveals 3D profile of brain atrophy in HIV/AIDS. In: 3rd IEEE International Symposium on Biomedical Imaging: Nano to Macro; 2006 April 6–9;193–196; Arlington, VA, USA. doi:10.1109/ISBI.2006.1624885

40.

Friston

Ashburner

Frith

Poline

Heather

Frackowiak

RSJ

. Spatial registration and normalization of images. Hum Brain Mapp. 1995;3 (3):165–189. doi:10.1002/hbm.460030303

41.

Thevenaz

Unser

. Optimization of mutual information for multiresolution image registration. IEEE Trans Image Process. 2000;9 (12):2083–2099. doi:10.1109/83.887976

42.

Ibáñez

Consortium

IS.

The ITK Software Guide: Updated for ITK Version 2.4. 2nd edn. New York, NY: Kitware; 2005.

43.

Klein

Staring

Pluim

JPW

. Evaluation of optimization methods for nonrigid medical image registration using mutual information and B-splines. IEEE Trans Image Process. 2007;16 (12):2879–2890. doi:10.1109/Tip.2007.909412

44.

Nie

Chuang

Kirby

Braunstein

Pouliot

. Site-specific deformable imaging registration algorithm selection using patient-based simulated deformations. Med Phys. 2013;40(4):041911. doi:10.1118/1.4793723

45.

Kadoya

Fujita

Katsuta

. Evaluation of various deformable image registration algorithms for thoracic images. J Radiat Res. 2014;55 (1):175–182. doi:10.1093/jrr/rrt093

46.

Kashani

Hub

Kessler

Balter

. Technical note: a physical phantom for assessment of accuracy of deformable alignment algorithms. Med Phys. 2007;34 (7):2785–2788. doi:10.1118/1.2739812

47.

Kirby

Chuang

Pouliot

. A two-dimensional deformable phantom for quantitatively verifying deformation algorithms. Med Phys. 2011;38 (8):4583–4586. doi:10.1118/1.3597881

48.

Serban

Heath

Stroian

Collins

Seuntjens

. A deformable phantom for 4D radiotherapy verification: design and image registration evaluation. Med Phys. 2008;35 (3):1094–1102. doi:10.1118/1.2836417

49.

Park

Kim

Yao

Ellis

Machtay

Sohn

. Building deformation error histogram and quality assurance of deformable image registration. Med Phys. 2012;39(6):3672–3672. doi:10.1118/1.4734922

50.

Kim

Monroe

Yao

. Use of deformation error histogram as an accuracy indicator for deformable image registration. Med Phys. 2013;40(6):169–169. doi:10.1118/1.4814296

51.

Pukala

Meeks

Staton

Bova

Mañon

Langen

. A virtual phantom library for the quantification of deformable image registration uncertainties in patients with cancers of the head and neck. Med Phys. 2013;40(11):111703. doi:10.1118/1.4823467

52.

Drzymala

Mohan

Brewster

. Dose-volume histograms. Int J Radiat Oncol. 1991;21 (1):71–78. doi:10.1016/0360-3016(91)90168-4

53.

Park

Monroe

Brindle

Sohn

. Developing a Universal Treatment Plan Review System. Med Phys. 2009;36(6):2663–2664. doi:10.1118/1.3182103

54.

Meijering

EHW

Niessen

Pluim

JPW

Viergever

MA.

Quantitative comparison of sinc- approximating kernels for medical image interpolation. In: Medical Image Computing and Computer-Assisted Intervention (MICCAI); 1999 September 19–22;1679:210–217; Cambridge, UK. doi:10.1007/10704282_23

55.

Pluim

JPW

Maintz

JBA

Viergever

. Mutual information matching in multiresolution contexts. Image Vision Comput. 2001;19 (1):45–52. doi:10.1016/S0262-8856(00)00054-8

56.

Park

Monroe

Yao

Machtay

Sohn

. Composite radiation dose representation using Fuzzy Set theory. Inform Sci. 2012;187:204–215. doi:10.1016/j.ins.2011.10.025

Quantitative Analysis Tools and Digital Phantoms for Deformable Image Registration Quality Assurance

Abstract

Keywords

Introduction

Overview of DIR

Previous Approaches to DIR QA

Image subtraction approach

Artificial DVF comparison

Landmark-based approach

Unbalanced energy approach

Other approaches

New proposed approach

Materials and Methods

Theoretical Process of Generating a Truth Image Set

Theory for Quantitative Error Analysis Tools

Local error analysis tool: Visualization of errors on each image slice with color mapping

Global error analysis tool: DEH

Case Study: Head and Neck DIR

Generating a head and neck QA image data set

Test of deformable registration systems using the head and neck digital phantom

Results

Creation of a Head and Neck Digital Image Phantom

Deformable Image Registration System Test Results

Local deformation error analyses

Global deformation error analyses

Discussions

Conclusions

Footnotes

Abbreviations

Declaration of Conflicting Interests

Funding

References