Abstract
New pipelines are required to automate the quantitation of emerging high-throughput electrophoretic (EP) assessment of DNA damage, or proteoform expression in single cells. EP cytometry consists of thousands of Western blots performed on a microscope slide-sized gel microwell array for single cells. Thus, EP cytometry images pose an analysis challenge that blends requirements for accurate and reproducible analysis encountered for both standard Western blots and protein microarrays. Here, we introduce the Summit algorithm to automate array segmentation, peak background subtraction, and Gaussian fitting for EP cytometry. The data structure storage of parameters allows users to perform quality control on identically processed data, yielding a ~6.5% difference in coefficient of quartile variation (CQV) of protein peak area under the curve (AUC) distributions measured by four users. Further, inspired by investigations of background subtraction methods to reduce technical variation in protein microarray measurements, we aimed to understand the trade-offs between EP cytometry analysis throughput and variation. We found an 11%–50% increase in protein peaks that passed quality control with a subtraction method similar to microarray “average on-boundary” versus an axial subtraction method. The background subtraction method only mildly influences AUC CQV, which varies between 1% and 4.5%. Finally, we determined that the narrow confidence interval for peak location and peak width parameters from Gaussian fitting yield minimal uncertainty in protein sizing. The AUC CQV differed by only ~1%–2% when summed over the peak width bounds versus the 95% peak width confidence interval. We expect Summit to be broadly applicable to other arrayed EP separations, or traditional Western blot analysis.
Keywords
Introduction
Electrophoretic (EP) separations provide critical information regarding the physicochemical properties and abundance of a given protein or proteoform. 1 Examples of physicochemical properties that separations can quantify include the protein molecular mass (as in sodium dodecyl sulfate–polyacrylamide gel electrophoresis [SDS-PAGE] 2 ), isoelectric point (determined by isoelectric focusing 3 ), or both (via 2D electrophoresis 4 ). Such separations are performed in a variety of formats, including slab gels, microchannels,5,6 and capillaries,7–9 each with specially designed analyses to quantify properties of the electropherogram. However, some analyses of EP separations still require substantial manual processing, such as the nonstandardized and poorly documented practices employed for densitometry of Western blotting. 10 Western blotting utilizes a size-based EP separation followed by a transfer of the separated analytes to a membrane (blotting) on which proteins are detected with fluorescent or chemiluminescent antibodies. Currently, it is still typical for the user to manually outline regions of the image to quantify (e.g., selecting the protein bands) using commercially available software sold with Western blot imagers. In a random survey of 100 publications from PubMed that utilized densitometry for Western blotting quantitation, only one paper provided sufficient details for the analysis to be reproduced 10 (e.g., how bands were selected, whether peak integrals or heights were used for quantification, and type of intensity profile background subtraction).
Manual processing becomes impractical for high-throughput EP separations. Our group and others have introduced high-throughput EP separations with up to hundreds to thousands of separations performed in parallel across an array,11–16 or with 1300 separations in series in a capillary. 17 Applications of arrayed separations include assessing cell-to-cell heterogeneity in DNA damage via single-cell comet assays,18,19 and quantifying proteoforms responsible for cancer drug resistance in breast cancer with EP cytometry. 20
EP cytometry performs thousands of single-cell protein separations in a device patterned on a standard microscope slide, and protein peaks are detected with fluorescent antibodies. We previously developed quantitative algorithms for EP cytometry to determine protein expression levels by area under the curve (AUC) analysis.13,21 However, the previous algorithms were cumbersome (with 10–20+ functions) and did not include features that would enable reproducible analysis.
Image processing of microscopy data faces similar challenges in accessibility, scale, and reproducibility. To aid in the accessibility of the analysis, image processing tools such as CellProfiler 22 and Ilastik 23 have introduced graphical user interfaces (GUIs) that guide the user through the analysis. Once the image processing pipeline is established, the user can then apply the same analysis across large data sets via a batch processing mode. Further, toward improving reproducibility, CellProfiler and Ilastik allow the user to save the analysis pipeline to a file that contains all of the relevant parameters. Starfish, 24 a toolkit for processing in situ transcriptomics data, approaches reproducible analysis by storing the results in data structures that automatically record the information required to reproduce the analysis (e.g., parameters used, versions of the software dependencies). However, processing of microscopy images of cells differs from the analysis needed for EP separation images, in which protein peak locations along a defined separation axis may be used to size and identify proteoforms. Existing image processing tools can quantify the integrated fluorescence (amount of protein) from segmented images, but are not readily adaptable to identify a separation lane to index protein band locations.
Inspired by advances in image processing toolkits for microscopy data, we introduce Summit for EP cytometry quantitation and assess the impact of specific algorithm design choices on measured single-cell protein and intrauser variation. Functional decomposition was employed to establish the four functions of Summit. The algorithm requires minimal user interaction through the use of GUIs, making it accessible to researchers of all computational backgrounds. Design choices, including data storage in a structure, background subtraction method, and quality control thresholding, are described. Analysis reproducibility is confirmed as differences in protein AUC distributions are not statistically significant when quantified by four different Summit users. When comparing background subtraction methods, we find that mean subtraction (similar to average on-boundary subtraction in 2D electrophoresis) yields 11%–50% more quantifiable single-cell protein peaks compared with the axial subtraction method used in our previous work. Such throughput gains come without substantial increases in variation in measured protein. We find that the appropriate background subtraction method for a given protein can be evaluated by comparing variation in the peak offset after subtraction and the AUC confidence interval with different peak bounds for summation. The guidelines presented here will aid in accurate and reproducible EP cytometry image analysis for various basic research and clinical applications.
Materials and Methods
Reagents
All reagents used are summarized in the Supplemental Information.
Cell Culture
Cell culture conditions are summarized in the Supplemental Information. All cell lines tested negative for mycoplasma and were authenticated by short tandem repeat analysis (Promega, Madison, WI) (see
EP Cytometry Separations
EP cytometry was performed as described in detail elsewhere21,25 and is summarized in the Supplemental Information and
Summit was written for MATLAB version 9.1 (2016b; MathWorks, Natick, MA) or higher. MATLAB program file dependencies of Summit include the Image Processing Toolbox (version 9.5 or higher) and the Curve Fitting Toolbox (version 3.5.4 or higher). The complete source code is available on GitHub (https://github.com/herrlabucb/summit), and instructions for installing Summit are provided in the repository readme.
Statistical Analysis
All box plots display the median, with the box edges at the 25th and 75th percentiles and whiskers that extend to the minimum and maximum values. The Mann–Whitney U test, Kruskal–Wallis test, Dunn–S Sidak post hoc comparison test, and linear regression were performed using built-in functions in MATLAB (2019b).
Results
The aim of EP cytometry image processing is to both quantify the protein expression and estimate the molecular mass (when using appropriate sizing standards). We developed the Summit image processing pipeline to be modular by using functional decomposition to discretize the pipeline into four functions. The raw image in Figure 1A contains 1615 individual separation lanes that are processed in the four functions shown in Figure 1B : (1) regions of interest (ROIs) are segmented; (2) a 1D intensity profile is generated and the background subtracted; (3) the intensity profile is fit to a Gaussian curve; and (4) the user inspects the fitted peaks as a final quality control step. From analysis of the protein peak AUC, cell-to-cell heterogeneity in protein expression or correlation between proteins in known cancer signaling networks can be quantified ( Fig. 1C ). Leveraging protein sizing in SDS-PAGE, Gaussian fit-determined peak locations of proteins can identify the mass of proteoforms, such as the truncated HER2 protein species t-erbB2 (~95–115 kDa) that confers trastuzumab resistance in HER2-positive breast cancer20,26 ( Fig. 1D , E ). In this instance, the pixel intensities for t-erbB2 are summed across the width of the ROI to produce the 1D intensity profile for background subtraction and Gaussian fitting to identify the peak location.

Summit is a semiautomated algorithm that extracts quantitative parameters such as protein peak AUC and peak location from EP cytometry images containing thousands of protein separations. (
User interaction is limited to the selection of array boundaries in the ROI segmentation function, the selection of peak boundaries in the curve-fitting function, and the quality control function. There are options to input thresholding parameters to assist with quality control, or the user may choose to manually inspect each separation lane, as described in more detail in the quality control section. Here, the reproducibility and accuracy of quantitation using Summit are explored in detail.
Assessing Analysis Reproducibility
To ensure that the results of an analysis performed with Summit are reproducible, we specified the output of the Summit analysis pipeline to include all parameters necessary to perform the analysis. By selecting a data structure for algorithm output variable storage, variables of different data types can be saved within a single file (
We leverage the arrayed format of most EP cytometry separation devices in the image segmentation process to generate the ROIs for each separation lane ( Fig. 2 ) using the roiGeneration function. In order to establish array boundaries and align the separation axes with the vertical axis of the image, a GUI first prompts the user to select microwells within a row approximately 10 microwells apart ( Fig. 2A, B ). The necessary angle of rotation to straighten or align the image is calculated from the two points. The rotation is performed with a built-in MATLAB function, imrotate(), which takes inputs of an image and an angle. The function first translates the image centroid to the origin (the upper left corner of the image), and then the image matrix is rotated around the origin by the angle calculated from the dot product of two direction vectors between the two selected microwell coordinates. Following rotation, the matrix is translated back to the original centroid location. After rotation, the user is instructed to select four microwells along the row and column boundaries of the array ( Fig. 2C ). The function informs the user if they have selected incorrect boundaries (e.g., if the coordinate of the “top-most” well of the array is lower than the coordinate of the “bottom-most”). As the gel is attached to a glass slide, image warping does not hinder the use of array boundaries and well spacings to segment the image. The aligned image is next segmented into ROIs based on the user input well-to-well spacing ( Fig. 2D ). The length (L) and width (W) of each lane set the ROI dimensions. The ROIs are then stored in the data structure as an L × W × N array, where N is the number of separation lanes. As an optional input variable of roiGeneration(), the user may provide a data structure with parameters such as the user-selected array boundaries, and angle of rotation, to apply the same ROIs to another EP cytometry image.

The EP cytometry image is rotated to align the array before segmentation into thousands of individual ROIs. (
When four users, users 1–4 (including two developers, users 1–2), independently performed EP cytometry analysis with Summit (using identical ROIs), they did not find substantial differences between the measured protein AUC distributions. We utilize Gaussian curve fitting to determine protein expression via AUC analysis. 10 The Gaussian fit function, f(x), employed is
where a is the amplitude, μ is the peak center, and b is the peak width. The protein peaks assume a Gaussian distribution owing to protein diffusion during the EP separation.
27
The AUC may be determined by summing over μ ± 2σ, where

Analysis automation yields insignificant interuser and intrauser variation in quantified protein signal for EP cytometry images collected experimentally and generated by simulation. (
To assess differences between the four users in variation in AUC (which will not be normally distributed 29 ), we quantified the CQV, which is recommended over the coefficient of variation for skewed distributions: 30
where Q3 and Q1 are the third and first quartile of the distribution. The CQVs only differ by ~6.5% between the distributions measured by the four users, who each analyzed 102, 107, or 108 peaks. A GUI displays each intensity profile and the corresponding Gaussian fit in a grid and requests that the user select the peak profiles to be thrown out. In total, there were 11 intensity profiles out of 121 manually inspected that all four users did not agree upon ( Fig. 3B ). To determine intrauser variation, user 2 repeated analysis of the GFP EP cytometry image days after the initial analysis ( Fig. 3C ). User 2 identified 108 peaks passing quality control initially and 112 on repeat analysis. To assess the similarity between the initial and repeat analysis sets of peaks, we calculated the Jaccard index to be 0.96 (the size of the intersection divided by the size of the union of the two sets), where 1 is the maximum. For the 108 peaks in common between initial and repeat analyses, the difference between the AUC distributions is not drawn from a distribution around zero (Wilcoxon signed rank test p < 0.0001). While p < 0.0001, the median AUCs are <1% different (user 2: median AUC = 8273; analysis 2, median AUC = 8223) and the distributions overlap substantially. The agreement in the AUC variation and the manual inspection of the separation lanes between users demonstrates the user-to-user reproducibility of Summit, while intrauser variation is also minimal.
As protein peaks may assume non-Gaussian shapes, particularly in chromatography data, users may opt to apply a built-in skewed peak function for fitting non-Gaussian protein peaks. We provide an example of simulated skewed protein peaks in Figure 3D , and the function required to generate skewed peak data is available on the Summit Github repository. The fit function utilizes a skewed Gaussian function given by
where β is the amplitude, ξ is a location parameter, ω is a scale parameter, erfc is the complementary error function, and α is a shape parameter, such that α > 0 produces a right-skewed peak, α < 0 gives a left-skewed peak, and α = 0 returns a nonskewed Gaussian peak. We calculate the AUC by summing between
To assess user-to-user variation in analysis of the simulated skewed Gaussian image, the four users performed Summit analysis with skewed Gaussian peak fitting ( Fig. 3E ). Here, all lanes of the simulated image pass quality control, so variation between users represents the impact of steps such as ROI generation, peak boundary selection (which aids in parameter estimation for the fits), and how robustly the peak fitting identifies bounds for AUC integration. The CQVs varied by <0.2% and the Kruskal–Wallis p value was 0.98, indicating that skew peak analysis is highly reproducible across users.
Finally, to benchmark Summit’s quantitative accuracy in assessing peak AUC variation, we carried out Summit and CellProfiler analysis on the simulated skewed Gaussian peak image. Among various batch image processing tools, we selected CellProfiler because it includes an example pipeline for skewed electrophoresis peak analysis from Comet assay gels (in which skewed DNA peaks randomly distributed across the gel report the degree of DNA damage from each single cell). Importantly, CellProfiler does not report metrics that readily allow protein sizing (peak locations relative to a common separation axis origin, e.g., microwell), but the image segmentation utilized in CellProfiler does report integrated fluorescence intensities for each background-subtracted peak in the gel. The calculated AUCs from Summit and CellProfiler are not directly comparable because of the 1D averaging of the ROI performed in Summit. Thus, we aimed to assess normalized AUC distributions (AUC for each peak divided by the mean AUC for all 400 simulated peaks) calculated by Summit and CellProfiler ( Fig. 3F ). The normalized AUC distributions are not statistically significantly different (Mann–Whitney p = 0.82). However, CellProfiler reports slightly lower CQVs in the normalized AUC (2.34% vs 3.01%). Thus, Summit quantifies AUC variation from Gaussian fitting of 1D intensity profiles that is in agreement with established batch image processing tools, while Summit is equipped to report separation metrics (e.g., estimated protein molecular mass or separation resolution between two peaks).
Comparing Efficacy of Mean versus Axial Background Subtraction
We investigated two local approaches for background subtraction for each individual ROI: (1) mean background subtraction (which closely resembles average on-boundary background subtraction in 2D electrophoresis
31
) and (2) axial background subtraction (employed in our previous work
21
). The background subtraction methods are compared with EP cytometry of four example proteins: GFP and GAPDH (from MCF7 breast cancer cell lines) and HER2 and p-rs6 (from patient-derived breast cancer tumor tissue; data made publicly available previously
20
) in
Figure 4
. Once each ROI is segmented, we generate 1D intensity profiles from each 2D EP cytometry image. This is achieved by averaging each pixel along the width of the ROI, which collapses the 2D image to a 1D raw profile (
Fig. 4A
). Both subtraction methods take “gutter” regions directly adjacent to the separation lane as the background region (

Comparison of mean and axial background subtraction for Western blot intensity profile background subtraction in Summit. (
In order to assess the efficacy of background subtraction, we introduce a method to quantify how well each method brings the protein peak baseline to zero. We quantified the median of an offset region of the intensity profile outside of the peak area (i.e., 2 to 3 σ from the peak center; outlined in gold in
Fig. 4B
). The offset intensity is lower when using mean background subtraction for GFP (median offset, 218 vs 258; Mann–Whitney p = 0.001; Cohen’s d < 0.01) and p-rs6 (median offset, 69 vs 159; Mann–Whitney p = 0.015; Cohen’s d < 0.01,
Automated Gaussian Curve Fitting for Thousands of Peak Intensity Profiles
In order to improve the quality of the Gaussian fit, the region of the intensity profile to be fit must be selected, and initial guesses for each of the fit parameters must be provided. The manual peak bound selection with an overlaid plot of all electropherograms was previously described for high-throughput microfluidic capillary electrophoresis
17
and is employed here for thousands of separations (
Increasing the goodness of the Gaussian fit improves the measurement of molecular mass and quantity of a detected protein. The R2 values of the Gaussian fit were lower with axial subtraction than with mean subtraction, except for the HER2 peaks ( Fig. 5 A, B ). The largest difference in R2 values between the two background subtraction methods was with the skewed GAPDH peaks (median R2 = 0.71 and 0.81 for axial and mean subtraction, respectively). GAPDH and HER2 had statistically significantly lower R2 values than GFP or p-rs6 with mean subtraction (Kruskal–Wallis p < 0.0001; Dunn–Sidak multiple comparison test p < 0.0001 for each pairwise comparison except for GAPDH and HER2, p = 0.14; all sample sizes shown in Fig. 5B ). The median R2 values for GAPDH and HER2 were 0.71 and 0.88, respectively. For a more specific assessment of fit performance, we quantified the uncertainty in the fit parameters. We calculated the percent variation between each fit parameter and the lower bound of the parameter’s 95% confidence interval and found that the uncertainty decreases with increasing R2 value, as anticipated ( Fig. 5B ). For example, the percent variation in peak width is
The parameter with the smallest percent variation is the peak center (

Quantitative assessment of goodness of Gaussian fitting and AUC dependence on mean versus axial background subtraction. (
Dependence of AUC on Peak Width Parameter Confidence Interval
We quantified AUC variation with summation of the background-subtracted intensity profile over μ ± 2σ or μ ± 2σci (which represents a 95% confidence interval for the AUC based on the uncertainty of the peak width fit parameter) as shown in Figure 5A . As shown in Figure 5C and Table 1 , the measured distributions vary minimally based on the background subtraction employed. For all proteins studied here except GFP, we fail to reject the null hypothesis that protein levels measured with mean or axial subtraction are drawn from a single distribution (Mann–Whitney test p > 0.05). The CQV of AUC differed by only a few percent between mean and axial background subtraction for each protein target investigated here. Mean subtraction slightly increases AUC CQV by ~2% and ~4.5%, respectively, for HER2 and p-rs6, while decreasing CQV by ~3.5% and 1% for GFP and GAPDH, respectively. Here, protein variation for GFP, GAPDH, HER2, and p-rs6 is relatively accurately reported when quantified from AUCs with the typical μ ± 2σ summation because CQVs differ by only ~1%–2% when calculated with μ ± 2σci summation bounds instead.
Comparison of the CQV (× 100%) for Mean versus Axial Background Subtraction and Summation Over ± 2σ or 2σci (Representing the 95% Confidence Interval for the AUC).
Discussion
We designed Summit as an automated platform for carrying out analysis of EP cytometry images, storing critical analysis parameters needed for reproducibility and often neglected in Western blot densitometry. Further, we sought to understand how algorithm design decisions, such as a method of background subtraction, affect uncertainty and variation in protein expression. Consequently, we introduce offset intensity as a metric for assessing the efficacy of background subtraction. Additionally, we evaluate the confidence intervals of Gaussian fit parameters and AUCs of protein peaks. While parameters and design choices may need to be tuned for each protein analyzed, we offer guidelines based on increasing the number of quantifiable separations per image and minimizing variation induced by the analysis itself.
User-to-User Variation in Quantitation Is Minimal Owing to Accessibility of Parameters to Reproduce Analysis and Automation
Emerging guidelines for computational research to improve scientific reproducibility 32 suggest that individual steps of code that affect the final analysis should be logged. 33 Thus, the choice of how to store output of both the analysis of interest and the variables used to execute the functions is crucial for making a given analysis reproducible.
Even using the same ROIs to segment an EP cytometry image, user-to-user-variation in analysis output can occur owing to manual quality control with optional quantitative thresholds. The optional thresholds include a minimum Gaussian fit R2 value (previously recommended at ~0.7) and an SNR ratio estimate >3. 21 After thresholding, a user may eliminate a given intensity profile for various reasons: (1) punctate fluorescence (e.g., dust or fluorescent antibody aggregates) in the peak region, (2) gel damage in the peak or background region, (3) large fluorescent debris (e.g., fluorescent cell debris) in the peak or background region, (4) unexplained injection dispersion causing peak skew, (5) poor Gaussian fitting resulting in inaccurate peak boundary identification, or (6) peak SNR ratio that is too low. Similar manual inspection and elimination of protein spots for analysis has been applied to regions of reverse-phase protein arrays, prompting concern that user-to-user variation in quality control could yield inconsistent results. 34 Here, the slight disparity in CQV (~6.5% for the experimentally collected EP cytometry image) may be attributable to differences in which intensity profiles the users allowed through quality control. When all four users did not agree, the most common feature of the peaks was a lower than average SNR ratio. Inclusion of a more rigorous SNR calculation as part of quality control (which would quantify peak signal as the peak amplitude and noise as the standard deviation of a background region at the edge of the ROI) may reduce user-to-user variation in the assessment of low SNR peaks.
Choice of Background Subtraction Method Impacts Assay Throughput
Background subtraction of protein signal remains a challenge in the quantitative analysis of protein electrophoresis and microarray images.35,36 The choice of background subtraction method strongly influences the measured quantity of protein 10 and variance. Common background subtraction approaches for electropherograms include rolling ball, baseline, and “on-boundary” (e.g., lowest or average) subtraction.10,35 The rolling ball method has been shown to yield densitometry protein measurements that are less correlated with a radioimmunoassay than if no background subtraction were performed. 10 Average on-boundary subtraction (the average of a trace just beyond the protein band boundary) in 2D electrophoresis was shown to achieve the lowest overall protein coefficient of variation in acidic 2D electrophoresis. A similar “neighborhood correction” strategy applied to reverse-phase protein microarrays that suffer from nonuniformities across the array also yielded a lower protein spot coefficient of variation. 37 Thus, on-boundary or neighborhood-based subtractions are appealing for both electrophoresis and protein arrays.
Similar to protein microarrays, EP cytometry background subtraction must occur for hundreds to thousands of individual regions across the array. Furthermore, nonuniform background, as observed in gradient gel EP cytometry 38 and in 2D gel electrophoresis, 39 requires a subtraction other than the straightforward and computationally inexpensive baseline subtraction. Thus, we aimed to identify appropriate local background subtraction methods that could address nonuniform background (across the device or within an ROI), maximize the number of quantifiable peaks, and reduce quantification variance.
Quantifying variation in offset intensity, CQV dependence on subtraction method, and summation bounds, and assessing if higher n reveals otherwise unmeasured cell subpopulations, can aid in the choice of background subtraction best suited for a given protein target and research question. When mean subtraction reduces or only slightly increases the CQV relative to axial subtraction, mean subtraction may be the preferred background subtraction method, as it likely increases n.
Maximizing the fraction of analyzed samples that yield useful data (i.e., pass quality control) is critical in single-cell analysis in order to identify rare cell subpopulations. We hypothesize that mean subtraction yields higher n because the effect of any “hot” pixels in the background gutter region is averaged out with mean subtraction. In the case of cells derived from precious patient tumor samples, maximizing assay quality yield is critical to assess biomarker heterogeneity (like the HER2 and p-rs6 analyzed here) that may dictate whether a chemotherapy drug is effective or not. 20 Thus, further refining the background subtraction process to maximize n is an important future direction.
As new data types may benefit from alternative background subtraction methods tailored to the features of the background, Summit includes an optional input to the intProf function in which the user may supply their own background subtraction function. Such custom functions will be especially useful for low-abundance proteins, for which low peak SNR ratios may result from averaging to generate the 1D intensity profile. Additionally, low SNR ratio peaks may benefit from denoising algorithms employed in 2D electrophoresis, 40 such as nonlinear filtering. 41
Narrow Confidence Intervals of Fit Parameters Indicate Analysis Contributes Minimally to Technical Variation of Quantified Peak AUC or Location
While offset intensity is different between mean and axial subtraction, we only expect differences in variation of offset intensity, not magnitude, to result in altered AUC variation. Thus, to truly understand whether either axial or mean background subtraction is associated with higher protein variation, we also directly compared the AUCs with different subtraction methods while evaluating the accuracy of AUC determination. We hypothesize that mean and axial subtraction yield different GFP AUC distributions because of the bleed-through of GFP signal into the background region seen in Figure 4A .
Accurate quantitation of peak width and location parameters is critical for the determination of protein size, 42 separation performance (e.g., separation resolution), and summation bounds for the AUC. Here, AUC is determined from the background-subtracted intensity profile, 43 as AUC quantification directly from the Gaussian fit is preferable for peaks with substantial overlap. We hypothesized that higher uncertainty of the peak width parameter may reduce accuracy of the AUC for lower R2 fits. We found that GAPDH and HER2 had the lowest median R2 values, which results from the lower SNR ratio of the GAPDH peaks and notable injection dispersion for HER2 shown in Figure 4A . However, the uncertainty of the peak width parameter for these lower R2 peaks did not lead to statistically significant differences in the AUCs when summing over μ ± 2σ versus μ ± 2σci (Mann–Whitney p > 0.05). Thus, Summit quantifies these examples of peaks with lower SNR ratio or dispersion without higher uncertainty in the AUC.
Quantitative analysis of EP separation data can reveal the presence and abundance of proteoforms responsible for biological processes such as cancer progression.44–47 Summit provides reproducible quantitation by storing all critical parameters and output variables in the results data structure. Further, automating most aspects of the analysis and quality control reduces user-to-user variation in the analysis results. We found that the differences between quantified protein peak AUC are not statistically significant between Summit users. Minimization of assay technical variation introduced by the algorithm may be achieved by examining variation in background subtraction offset, and CQV variation within the 95% confidence interval of peak bounds. In some instances, there may be a trade-off between the number of analyzed peaks and variation, depending on the background subtraction employed.
For both background subtraction methods explored here, punctate noise in the vicinity of the peak or background “gutter region” is a common feature that prevents certain peaks from passing quality control. We are investigating 2D segmentation methods that may improve yield by isolating the antibody signal from the punctate nonspecific signals on the periphery of the separation lane. Improving assay quality yield would enhance the ability of EP separation methods to detect rare cell populations.
Owing to the modular design of Summit, we anticipate components such as the ROI generation and intensity profile background subtraction could be applied to other arrayed separations and detection of different biomolecules. To date, the algorithm has been applied to both size-based separations in custom lab-on-a-disk devices 48 and single-cell isoelectric focusing assays. 16 For the latter, geometric differences between the 1-mm-long EP cytometry separation lanes and 9-mm-long focusing zones were readily modified, as the parameters defining the geometry of the separation array are Summit input variables. Given the ease of adapting the analysis to other geometries, Summit could likely also be used for standard Western blotting densitometry. We anticipate that the skewed Gaussian function included in Summit could be applied to commonly skewed data types such as the arrayed comet assay for DNA damage. 15 Further, addition of an alternate skewed fit function, such as the Lorentzian-modified Gaussian,49,50 could allow application of Summit to chromatography data.
In the future, Summit-extracted peak shape parameters may shed insight on proteins that are poorly solubilized in EP cytometry (e.g., proteins originating from circulating tumor cells 51 ). Metrics of EP injection dispersion may guide optimization of sample preparation. For example, we envision identifying effective denaturation conditions for EP cytometry tuned to different classes of protein domain structures 52 based on peak skew, variance, and quantity of signal entrapped at the microwell interface. Such quantitative analysis will broaden the applicability of EP cytometry to new sample types and protein targets.
Supplemental Material
sj-pdf-1-jla-10.1177_24726303211036869 – Supplemental material for Summit: Automated Analysis of Arrayed Single-Cell Gel Electrophoresis
Supplemental material, sj-pdf-1-jla-10.1177_24726303211036869 for Summit: Automated Analysis of Arrayed Single-Cell Gel Electrophoresis by Julea Vlassakis, Kevin A. Yamauchi and Amy E. Herr in SLAS Technology
Footnotes
Acknowledgements
We are grateful to current and former Herr lab members, and students from the Cold Spring Harbor Labs Single-Cell Analysis Course for user feedback on Summit. We acknowledge Ms. Anjali Gopal and Mr. Andoni Mourdoukoutas for their contributions to reproducibility analyses.
Supplemental material is available online with this article.
Declaration of Conflicting Interests
The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: The authors acknowledge competing financial interest(s) as follows: J.V., K.A.Y., and A.E.H. are inventors of EP cytometry intellectual property and may benefit from licensing royalties.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was partially supported by the Society of Lab Automation and Screening Graduate Education Fellowship (J.V.), NSF Graduate Fellowship DGE1106400 (J.V., K.A.Y.), NSF CAREER CBET1056035 (A.E.H.), and NIH R01CA203018 (A.E.H.).
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
