Abstract
To address the need for a clinically applicable intravital optical imaging system, we developed a new hardware and software framework. We demonstrate its utility by applying it to an endoscope-based white light and fluorescent imaging system. The capabilities include acquisition and visualization algorithms that perform registration, segmentation, and histogram-based autoexposure of two imaging channels (full-spectrum white light and near-infrared fluorescence), all in real time. Data are processed and saved as 12-bit files, matching the standards of clinical imaging. Dynamic range is further improved by the evaluation of flux as a quantitative parameter. The above features are demonstrated in a series of in vitro experiments, and the in vivo application is shown with the visualization of fluorescent-labeled vasculature of a mouse peritoneum. The approach may be applied to diverse systems, including handheld devices, fixed geometry intraoperative devices, catheter-based imaging, and multimodal systems.
OPTICAL IMAGING, with interventional and surgical applications1–4 and with real-time, independent, multitarget capabilities combined with high spatial resolution has great potential to enhance the ability to diagnose molecular events in vivo. However, current small animal whole-body near-infrared (NIR) reflectance imaging systems have somewhat limited direct clinical translation, given their approximately 1 cm maximum signal depth penetration from exterior surfaces and fixed geometry. 5 The introduction of endoscope- and catheter-based imaging systems has resolved many of the limitations of reflectance imaging by allowing intravital access to deeper pathologies, 6 such as transitional cell bladder carcinomas, ovarian carcinomas, lung carcinomas, 7 and colonic adenocarcinomas.8–10
Unfortunately, the current state of the art for real-time optical imaging hardware and software is limited in several ways: low dynamic range when compared with other imaging modalities, such as computed tomography (CT) and magnetic resonance imaging (MRI), lack of real-time signal processing, and poor fluorescence signal quantization. In the realm of optical imaging, data acquisition from multiple imaging channels can be rapid, in some cases markedly exceeding video rate acquisition. Moreover, there is a distinct benefit to real-time processing of the image data to provide feedback to operators 11 and for evaluating the distribution of signal intensities to allow automatic correction. The goal of this study was to develop and test an optical imaging system with integrated software and hardware capable of acquiring multiple independent 12-bit optical imaging data streams in real time and processing the images to allow for frame by frame exposure time adjustment and image alignment. Real-time execution was achieved by sharing the computational burden across two computers and optimizing the image processing for the specific computer platforms. By using previously published segmentation and registration algorithms and generating pixel intensity histograms for each image, we were able to align images across two imaging channels on the fly as well as increase the dynamic range by more than six orders of magnitude compared with previous implementations. We hypothesized that this device could be applied toward fluorescence endoscopy with white light (WL) and NIR imaging channels. Catheter-based systems and handheld devices in which the reference anatomic and molecularly reporting NIR images move frame by frame relative to the imaging target are optically analogous and may thus directly use the developed framework. The techniques presented here can ultimately be generalized to several other data stream permutations, including WL optical imaging combined with two or more NIR channels 5 or WL optical and NIR imaging combined with x-ray fluoroscopy or ultrasound imaging.
Materials and Methods
Optics
The optical acquisition system was designed to accept standard fiberoptic catheters and was similar conceptually to previous designs.8,12 A 1.6 mm outer diameter catheter (Edwards LifeSciences, Irvine, CA) with a 0.9 mm working channel received excitation photons from a 300-watt xenon lamp (Sunoptics, Jacksonville, FL) filtered through a 680 nm long-pass filter to decrease false-positive NIR signal. The collection photons were segregated via dichroic beamsplitters and bandpass filters (Omega Optical, Brattleboro, VT) into two channels: WL and NIR. In this experiment, the NIR dye used was cyanine 5.5 (Cy5.5) (Amersham, UK). The photons were then focused onto two CCD cameras using 50 mm achromatic lenses.
Hardware
The imaging hardware included charge-coupled device (CCD) cameras with high sensitivity, high dynamic range, high spatial resolution, and digitally tunable exposure times. A Pixelfly SVGA with a color CCD was chosen for WL image capture and a Pixelfly QE with a grayscale CCD was chosen for NIR signal detection (PCO, Germany); both cameras have 12-bit acquisition with 16-bit data transfer. An open-source driver and programming interface for the cameras enabled in-house development of custom software to implement the specific goals of this study.
Image acquisition and camera control were performed by a 3 GHz Intel Pentium 4 computer with 3 Gb of RAM running the Linux operation system, kernel version 2.4. The images were then piped to an Apple Quad G5 computer running Mac OS X version 10.4 for image processing, display, and archiving; the graphic user interface was also run on the Apple computer. The code was written in C, C++, and Cocoa.
Software Design
Figure 1 presents a schematic of the in-house written software (more than 3,000 lines of computer code excluding open-source libraries) to control the optical measurement system. Once the cameras have been initialized, they are registered using the Automated Image Registration software package. 13 Specifically, we use a six-parameter affine model to calculate the spatial translation, rotation, and scaling necessary to align camera 2 (NIR) to camera 1 (WL), as shown in Figure 2. We then use a nearest-neighbor interpolation method to reslice every subsequent image captured by camera 2 according to the calculated transformation parameters.

Schematic of the software pipeline. The top arrow diverges into two parallel threads executed simultaneously. Note that the two threads are independent, allowing different frame rates for the two cameras. Left and right arrows loop for each camera image. Expanded bubbles show histogram-based calculations. ROI = region of interest.

Spatial correlation and orientation of imaging channels. A, Nonaligned images demonstrate significant ghosting when overlaid. B, Registration results in a diminished ghosting effect.
Once the two data streams have been aligned, a region of interest (ROI) is defined around the catheter image using the segmentation capabilities of the Insight Segmentation and Registration Toolkit (National Library of Medicine, Bethesda, MD) software package. 14 A segmentation mask is generated using a simple threshold-based algorithm that determines whether a pixel is within the catheter image. That is, if a pixel is less than a certain value, it is designated as outside the catheter image and is therefore mapped to a white pixel in the segmentation mask; if it is above that threshold value, then it is designated as within the catheter image and is therefore mapped to a black pixel in the segmentation mask.
Once registration and segmentation have been performed, the cameras enter a continuously looping subroutine that handles their image acquisition, exposure time calculation, image display, and image saving functionality. The individual camera loops are run as separate threads; this multithreaded software architecture allows the two cameras to be run independently yet still within the same program. Importantly, camera acquisition times may be different, markedly expanding the dynamic range of both channels.
Once an image has been captured, a histogram of the catheter data, which is defined as the pixels within the ROI, is generated for each frame in real time; this histogram is used to calculate a new exposure time. The exposure time is adjusted such that the image intensity approaches a previously defined setpoint and is determined by the following algorithm:
Limits are established to ensure that the exposure time never falls below a minimum of 100 μs or exceeds 550 milliseconds. Moreover, the new exposure time is constrained to be within 30% of the previous exposure time to prevent excessive fluctuations; importantly, this constraint is rarely required as most exposure time adjustments, except in the most extreme (and artificially created) cases, are within 20% of the previous exposure time, even with very rapid catheter movement in biologic systems. It is important to note that since each image will have a unique exposure time, it is no longer possible to compare raw pixel values between different images. As exposure times vary, so, too, will the individual pixel counts. To account for this, we divide each image's pixel values by that image's exposure time; thus, we use counts/second, instead of raw counts, as our data values.
The images are then scaled down from 12 bits to 8 bits for display on a computer monitor. For WL images, the 12-bit images are rescaled to 8-bit color images with a user-definable variable brightness. Additionally, the red, green, and blue intensities in the color images can be adjusted by the user. This feature accounts for the fact that excitation filters that prevent false-positive results in the NIR channel may reduce the number of red photons in the illumination light, leading to less natural, greener WL images; these lost photons can be “recreated” by adjusting the color balance to allow for more natural WL images. For NIR images, the 12-bit grayscale images are rescaled to 8-bit images using a pseudocoloring algorithm that colorizes the images based on each pixel's percent change from a user-definable variable baseline. After the images are displayed, the original 12-bit images are saved to file. Following this, the camera is turned off, and the subroutine loops back to the top, as shown in Figure 1.
In Vitro Testing
The automatic exposure time adjustment experiments were conducted by first spreading NIR fluorescent dye (100 μL of 44.3 μM Cy5.5 fluorochrome) onto either glossy photo paper or nonglossy stationary and allowing the dye to dry. A Massachusetts General Hospital (MGH) logo, which provided contrast, was then printed onto the paper and imaged with the software described above.
To measure how the WL camera's pixel values vary with exposure time, a blank sheet of white paper was imaged at a constant height over different exposure times. To measure this relationship for the NIR camera, a well filled with NIR fluorescent dye (44.3 μM Cy5.5 fluorochrome) was imaged at a constant height at different exposure times.
In Vivo Testing
The study was approved by the institutional animal care committee. To demonstrate application of real-time computational methods to in vivo imaging, athymic mice (nu/nu; Taconic, Germantown, NY) were anesthetized under isoflurane for minimally invasive peritoneal imaging. A sheath (1.9 mm outer diameter) was introduced into the lower right quadrant of the abdomen, and 1 to 3 mL of air was gently introduced into the peritoneal cavity. The endoscope was introduced through the sheath, and the peritoneum was imaged. Once blood vessels were recognized in the WL channel, the NIR blood-pool imaging agent Angiosense-680 (Visen Medical, Woburn, MA) was administered intravenously to create detectable intravascular fluorescent signal. 15
Statistical Analysis
All data are presented as means plus or minus the standard error of the mean. Image colocalization for the registration data was quantified by calculating the Pearson correlation coefficient using the software program ImageJ (National Institutes of Health, Bethesda, MD). Linear regression was performed using the nonlinear least-squares Marquardt-Levenberg algorithm for the counts/second data; a correlation coefficient of r greater than .99 was considered to be significant.
Results
Registration
As Figure 2 qualitatively demonstrates, the degree of overlap between the two cameras' images is very poor without registration; however, on performing the appropriate translation, rotation, and scaling transformations, the images exhibit much stronger similarity. This effect can be expressed statistically: the correlation between camera 1's image and the unregistered camera 2's image is r = .51, whereas the correlation between the former with the registered camera 2's image is r = .89 (where 0 means no correlation and 1 means perfect correlation).
Histogram Generation
Figure 3A illustrates the need for histogram calculation and the utility of the 95% area-under–the-curve pixel value when calculating the subsequent image's exposure time. The WL signal is generated by direct reflection of the excitation light from the surface. Often, owing to the curved wet and thus highly reflective surface imaged, a few pixels will demonstrate much higher signal than the remaining pixels. This may be exacerbated when the endoscopic tip is in close proximity to the surface. As can be seen, a maximum intensity value (point A in Figure 3A) is determined by a single highest count pixel. We typically have found that if the exposure time were calculated so that the maximum pixel value was adjusted to a setpoint, the presence of these rare, nonrepresentative, high-reflection pixels would lead to excessively low exposure times and images that were very dim. Moreover, if the setpoint was placed at 95% (or other value) of the intensity of point A, it would still be determined by a single-pixel intensity and often would not adequately represent the pixel intensity distribution because of the changing angle and distance, which markedly modulate the relationship between highest pixel intensity and signal intensity distribution. By using the 95% area-under-the-curve pixel value (point B) instead, we ensure that whereas some small parts of the image could be saturated (depending on the set point), the majority of the image is visible with appropriate image intensity. The imaging time is based on the entire pixel histogram distribution and not on a single pixel value.

A, Representative histogram of a white light (WL) image, illustrating the effects of direct reflection around illumination fibers, resulting in a few bright pixels that determine the highest pixel intensity (point A). By using the 95% area-under-the-curve pixel value (point B) to calculate exposure times, we avoid excessively small exposure times that lead to dim images. B, The linear relationship between pixel intensity and exposure time for both WL and near-infrared (NIR) cameras justifies the use of counts/second as an exposure time–independent measure of photon collection.
Counts/Second
Because of the variability in exposure times, we cannot directly compare the raw pixel values of different images; instead, we must normalize each image's pixels by that image's exposure time. We justify this use of counts/ second by calculating the mean pixel values within the catheter ROI over different exposure times while maintaining a constant height from the phantom surface. As seen for both NIR and WL cameras in Figure 3B, there is a linear relationship between pixel counts and exposure times. Pixel counts monotonically increase with exposure time even though the phantoms imaged remain constant. As this relationship is linear, we can use the ratio of the pixel value to the exposure time, the flux term counts/second, as our constant, exposure time-independent data value.
Exposure Time Adjustment
Figure 4 demonstrates the advantages of an automatically adjusted exposure time. Although both WL and NIR images are saturated at short distances from the surface at exposure times optimized for greater distances, autoexposure allows us to collect meaningful data all the way to the phantom's surface owing to the increased dynamic range that is achieved by this added variable.

As distance from the endoscope tip to the surface of the letterhead paper decreases (up each column) for the white light (WL) camera (A) or near-infrared (NIR) camera (B), the absence of the autoexposure calculation (left column) results in a more saturated image devoid of information when compared with executing autoexposure (right column).
In Vivo Imaging
Figure 5 shows a single frame of the peritoneal imaging videofeed (as seen by the operator in real time) created while imaging a mouse peritoneum under anesthesia. The left image relays the conventional color anatomy and the right image is the NIR channel mapped to a color lookup table for signal intensity and spatially registered to the first image. Given that the administered probe Angiosense-680 is a fluorescent blood-pool agent, we are able to visualize the tissue vasculature in the NIR. The exposure time feedback loop correctly adjusts the exposure time in real time so that the signal is within detection while avoiding saturation of any pixels.

In vivo images from the white light (A) and near-infrared (NIR) (B) channels of the peritoneum vasculature. The NIR signal shows a fluorescent blood-pool agent highlighting the vessels. The alignment allows direct, exact comparison of the two channels on a pixel by pixel basis in real time.
Discussion
As an imaging modality, optical imaging benefits from high resolution (compared with ultrasonography and fluoroscopy), real-time imaging (compared with CT and MRI), and the ability to image multiple molecular targets simultaneously. Moreover, advances in the design of optical molecular probes have paved the way for the application of this technology toward the detection and monitoring of pathophysiologic processes such as cancer,16,17 atherosclerosis, 18 and inflammation.19,20 Unfortunately, no intravital optical imaging system exists today that fully capitalizes on this modality's unique advantages. The low bit depth of current systems leads to lower dynamic ranges than the above-mentioned clinical imaging modalities, increasing the risk of quantization error. Additionally, simultaneously acquired signals from multiple reporters at different wavelengths or anatomic localization images are difficult to rigorously correlate in real time as the two data streams are not aligned with respect to one another.
The advances introduced in the current study fall into two general areas: hardware implementation and software advances. In particular, our endoscope/catheter-based imaging system has two 12-bit CCD cameras that capture data at a bit depth that has become the standard of care for many medical imaging modalities. The higher bit depth allows for better quantization of the image data, decreases digitization effects, and provides a wider dynamic range. The camera hardware also permits asynchronous acquisition, allowing the implementation of multithreading and independent variable frame rates for the two cameras. The developed software package takes full advantage of the hardware and is, in fact, immediately expandable to higher bit depth systems (such as 14-bit cameras) and to additional cameras recording additional NIR channels or other image data streams, such as fluoroscopy. We have demonstrated that the developed routines allow real-time registration, segmentation, temporal correlation, and window-leveling in parallel with data acquisition, which can be dynamically adjusted at each frame. To our knowledge, these capabilities have not been demonstrated with other multichannel or multimodality preclinical or clinical systems.
The intravital access achieved with catheter- or endoscope based or handheld systems comes at a dynamic range cost compared with fixed systems in which the camera to tissue distance is constant. Catheter-based systems face excessive brightness when the catheter is closer to a surface and excessive dimness when farther away. Signal intensity data can vary over orders of magnitude even for fixed concentrations of fluorochrome, given the continually changing distance from the catheter tip to the target. Real-time adjustment on a frame by frame basis of the individual cameras' exposure times based on current frame image intensity resolves this issue. The autoexposure routine we have demonstrated here prevents both saturation of the CCD or digitization effects from “underfilling.” We implemented a dampening feature that limits changes to 30% for frame to frame variability, but we have never exceeded this during our in vivo evaluation of the system, even under extreme cases. Although the exposure time routine maintains adequate filling of the CCDs at each frame, for quantization, we must account for the fact that pixel values are dependent on exposure time. We cannot directly compare pixel values between images taken at different integration times. However, as shown in Figure 3B, the relationship between pixel counts and exposure time is linear. Therefore, by dividing the pixel counts (minus readout noise) by the exposure time, we are able to use the exposure time–independent counts/second value to compare pixels from different images and achieve a quantitative parameter. Of equal importance, using counts/second markedly increases the dynamic range of our data. Although the raw pixel values still range from 0 to 212, our counts/second data have additional flexibility in exposure time: counts/second data can take any value from 0 to 212 divided by the range of possible exposure times (100 μs to 550 ms, or ≈219). This results in ≈231 possible values, correlating to an effective 31-bit dynamic range. This precision is much greater than most clinical imaging systems and can be applied to each of multiple channels independently. Additionally, the ability to generate realtime histograms allows us to vary exposure times in a manner that is not dominated by direct reflection near the illumination fibers. Thus, the exposure time is not determined by a few bright pixels resulting from reflection off moist mucosa.
We have demonstrated these capabilities in vivo using intraperitoneal vasculature. The data sets generated by the operator have value both as video images for signal visualization procedures providing real-time guidance and a stack of 12-bit frames for future quantitative analysis. Traditional ROI analysis with pixel averaging is now also applicable to each image frame from the endoscope- or catheter-based procedure. For example, post–acquisition processing of images similar to those shown in Figure 5 would allow quantization of vascular leak of fluorescent imaging agents by comparing intravascular with adjacent extravascular signal.
Improvements in the sophistication of optical imaging technology will allow independent or combined systems to take full advantage of optical molecular imaging opportunities that have expanded in recent years as a result of the development of activatable and targeted optical probes. Outside the realm of optical imaging, a software implementation similar to the one shown here can easily be applied to other real-time acquisition modalities. Taking advantage of a multithreaded computer code, the data pipeline has the potential to be generalized to multiple simultaneous data streams and to many types of cameras, including fluoroscopy, thermal maps, visible light, or other real-time images, such as from ultrasonography.
