Exploiting Molecular Barcodes in High-Throughput Cellular Assays

Abstract

Multiplexing strategies, which greatly increase the number of simultaneously measured parameters in single experiments, are now being widely implemented by both the pharmaceutical industry and academic researchers. Color has long been used to identify biological signals and, when combined with molecular barcodes, has substantially enhanced the depth of multiplexed sample characterization. Moreover, the recent advent of DNA barcodes has led to an explosion of innovative cell sequencing approaches. Novel barcoding strategies also show great promise for encoding spatial information in transcriptomic studies, and for precise assessment of molecular abundance. Both color- and DNA-based barcodes can be conveniently analyzed with either a microscope or a cytometer, or via DNA sequencing. Here we review the basic principles of several technologies used to create barcodes and detail the type of samples that can be identified with such tags.

Keywords

barcodes sequencing single cell molecular tags screening genomics chemical process development

Introduction

The widespread implementation of drug screens by both the pharmaceutical industry and academia has triggered the development of barcoding strategies to significantly increase the number of molecules and samples that can be simultaneously characterized. The advent of sophisticated technological hardware for laboratory automation permits highly multiplexed approaches that greatly reduce time and cost. In this context, molecular tags can be used to specifically label—and thereby act as unique identifiers for—a variety of possible entities, including individual cells,¹ pooled samples,^2,3 macromolecules,⁴ spatial regions,⁵ and cell lineages.⁶ These molecular tags are designed to label specific cells and molecules and possess biochemical properties that facilitate their identification.

The most widespread labeling approaches use either short oligonucleotides¹ or fluorescent labels,⁷ as these can yield a large number of distinct combinations. Furthermore, identification of such tags is usually performed with standard equipment where sequencing, or spectral detection, is integrated with high-throughput assays. For example, short DNA molecules where each base can take four possible values yield enormous numbers of unique permutations. Indeed a 10-base-pair (bp) DNA oligo spans 4¹⁰ (more than a million) different combinations. On the other hand, simple color barcodes based on only five different fluorescence molecules (e.g., DAPI, FITC, cyanine3, cyanine5, cyanine7, or any dye with similar excitation/emission spectra) in on/off states can generate 2⁵ (32) labels. These commonly used channels can be detected with standard filters available on most fluorescent microscopes, and their number can be further increased with more specialized hardware, as mentioned later in this review. In the case of either short oligonucleotides or fluorescent labels, the number of attributes that can be simultaneously screened increases as a power of the number of channels, thereby generating large numbers of unique barcodes for multiplexing.

For color labels, two additional encoding dimensions can be incorporated to create barcodes. The first relies on different levels of signal intensity⁷ to yield higher numbers of combinations. Indeed, while five colors used in on/off states generate 32 labels, using a code consisting of three intensities (no signal, low intensity, high intensity) could in principle generate up to 3⁵ (243) labels. The second dimension involves positioning colored molecules on a carrier structure⁸ so that their order can be measured. For example, the sequence of colors along a carrier RNA molecule can be used just like DNA bases to generate a code.⁹ The use of super-resolution microscopy allows precise determination of the position of each fluorescent molecule from which such sequences can be inferred. Instead of RNA carriers, hydrogels have also been used to spatially organize colored molecules, for instance, within a bead, to create color barcodes.^10,11

For DNA labels, a large number of different strategies have demonstrated the great versatility of this technology. For example, various pipelines developed for single-cell transcriptomics have incorporated different barcoding methods. Currently, the most widespread single-cell sequencing technology isolates cells in liquid drops, which need to be tagged before being pooled into one sequencing reaction.¹ Barcoding individual cells is achieved via inclusion of distinct short DNA oligonucleotides into all cDNA sequences during library preparation. Such DNA labels are used to assign each read to a cell of origin during analysis.¹

Based on a similar approach, cellular samples from different origins can also be barcoded, pooled, and sequenced in a single run. Sequenced DNA molecules include both the genetic information and the barcodes that are used to match sequencing information to a sample.^2,3 Considering the important cost of reagents in sequencing technologies, pooling material is crucial toward reducing cost as well as time.

Short DNA molecules are also used to barcode antibodies and proteins, that is, to combine proteomics and genomics.⁴ This powerful approach permits detection of proteins and epitopes alongside transcriptomic data at the single-cell level. Furthermore, barcodes are also used to tag the position of cells within a sample prior to tissue digestion.^12,13 Thus, transcriptomic data can be matched with spatial tissue organization and cell distribution. Finally, cells can also be barcoded for lineage characterization where a unique identifier is passed to each cell’s progeny, allowing one to track differentiation and migration during developmental studies.⁶

In this review, we explore how barcodes have recently been exploited in a wide range of applications. We first focus on the use of cellular tags to recognize cells in next-generation sequencing (NGS) pipelines, and then detail how the same techniques are allowing the identification of proteins in a sequencing protocol. We also consider how spatial position can be encoded to be paired with a sequencing read of a sample. Finally, we examine how color is being used to barcode various types of probes, such as antibodies, proteins, or small ligands used to label cells or DNA fragments.

Barcoding for Single-Cell Transcriptomics

The use of oligonucleotides as barcodes has been key to the success of NGS techniques.^1,14,15 Although details vary among sequencing platforms, short DNA identification sequences are incorporated into primers used for library preparation. Most of these, including Nextera primers, can be purchased in versions that include short barcodes. Before sequencing, during library preparation, each cDNA molecule is fragmented and extended from both ends with Illumina’s adaptor sequences. When desired, each adaptor sequence can include identifiers that generate up to 384 combinations to identify each well in a plate. After library preparation, the 384 encoded libraries are pooled for sequencing, and resulting reads can still be distinguished.¹⁶

More recent single-cell RNA sequencing (sc-RNA-seq) techniques further increase throughput by exploiting microfluidics to encapsulate cells in liquid drops.¹⁷ These drops are generated by water-in-oil emulsion, where each droplet replaces a well in a plate ( Fig. 1A ). This approach dramatically increases the number of cells that can be simultaneously processed, that is, up to several thousand. Each captured cell is assigned an identity through a randomly generated DNA sequence that is immobilized in a gel bead (or on a solid bead) inside a water droplet ( Fig. 1B ). The size and generation rate of liquid drops are tuned to maximize the number of droplets that contain only one cell and one bead. Barcode synthesis is realized by creating DNA molecules on gel beads one base at a time in a controlled fashion.¹ Beads are randomly split into four equal groups, each of which receives one of the four DNA bases. Beads are then pooled and randomly split again into four groups for the addition of the next base. This process of pooling and splitting is repeated several times, so that each bead carries multiple copies of the same sequence. The huge number of possible combinations obtained with very few bases (16,777,216 for a 12 bp barcode) guarantees a unique code for each bead with very low probability of two beads associated with the same sequence in a sample of a few thousand cells. Since the synthesis of cDNAs from captured single cells is performed inside the droplet, and all primers in a bead carry the same barcode, cDNA molecules from a single cell share a unique tag.

Figure 1.

(A) Single cells are encapsulated with beads and also lysed inside droplets in a microfluidic device.¹ (B) Barcoded beads are covered by short DNA oligos containing a PCR handle to hybridize primers during library preparation, a randomly polymerized sequence of 10–12 bp to barcode each bead, another random 8 bp sequence that is different in all oligos of the same bead, and finally a poly-T sequence of 30 bases to hybridize the poly-A tails of mRNAs. (C) Antibodies can be similarly barcoded; here oligo contains a poly-A tail to hybridize as mRNAs originating from the cells do. (D) During RCA, two different antibodies are tagged with different DNA probes and hybridized with two other short ssDNA molecules. Only if antibodies are colocalized can ssDNA strands be ligated in situ yielding circular DNA. In a final step, this rolling circle is used to amplify a long DNA product that contains several repeats of the antibody-specific sequence.²⁸ (E) In one of the Brainbow versions, a Cre recombinase is used to stochastically excise pieces from a sequence originally inserted within a cell’s genome and encode for three fluorescent proteins. By design, only the first (downstream of the promoter) is expressed. Cre-specific sequences flank these regions in a way that renders them mutually exclusive, resulting in the excision of one, the other, or none of the sequences. After Cre recombinase activation, cells therefore either are still red or become randomly blue or yellow.⁶² (F) In CODEX, antibodies used to tag cells are barcoded with DNA sequences and their respective primers. A first amplification step with three bases, nonfluorescent G and fluorescent U and C, is performed. This time, all template sequences containing A or G can be detected in fluorescence, while those containing a C are not extended since G is missing from the mix. After image acquisition and fluorophore removal, a second extension is performed with a mix of bases comprising A and fluorescent U and C. This time, all sequences containing A or G can be detected in fluorescence, while sequences containing G are blocked. Repetition of this cycle allows the detection of two antibodies per image.⁷⁸

As an alternative, Ramani et al.¹⁸ proposed the use of a similar combinatorial barcoding method on fixed nuclei, without requiring their individual capture. Cells are digested and nuclei distributed in a 96-well plate with no more than 25 nuclei per well. In each well, DNA within nuclei is tagged by proximity ligation with a first barcode. Nuclei are pooled and split in 96-well plates again, and a second tag is placed at the extremities of the DNA molecules. The grouping of these subsequently added tags creates unique combinations (9216 combinations in the case of 96-well plates) that can be used to identify individual nuclei.

Single-cell combinatorial indexing RNA sequencing (sci-RNA-seq), a similar method developed by Cao et al.,¹⁹ is also based on splitting and pooling fixed cells. Here cells are fixed, permeabilized, and distributed in multiwell plates. Each well is then incubated with a specific poly-T primer that includes a handle (i.e., a sequence common to all primers that enables PCR amplification) and a barcode, and mRNA molecules are reverse transcribed. Cells are then pooled and redistributed in multiwell plates where barcoded cDNA molecules are PCR amplified with primers specific for the handle sequence carried by the poly-T primers from the first step. All PCR primers carry their own barcode. Therefore, all cells carry a combination of two barcodes, one from the primer used in the cDNA synthesis, and one from the primer used for the PCR amplification. Here again, probabilities ensure that they almost all have a different combination of the two barcodes, which allows the reliable identification of individual cells.

Most primers used in single-cell NGS studies use barcodes not only to differentiate cells from each other but also to identify reads originating from single RNA transcripts. Indeed, in the original droplet sequencing (Drop-seq) paper,¹ barcoded primers also contained a random eight-base sequence, termed unique molecular identifier (UMI), which was different for each primer of the same bead among 65,536 possibilities. The presence of UMIs allows the filtering of noise amplification artifacts from real cellular expression levels.

The main limitation of single-cell sequencing is related to noise, as low-expression transcripts are rarely captured, which yields highly variable measured signals. The strategies for creating barcodes in this area are relatively well established, and efforts now focus primarily on improving sequencing noise, coverage, and tissue preparation. Another serious limitation originating from the use of beads to associate barcoded molecules to each cell is that in order to guarantee that single (not doublet) barcoded beads are enclosed with unique cells in droplets, it is necessary to dilute beads, resulting in the loss of large numbers of cells.¹ This is not a problem for cell types that are highly represented in the sample; however, losing the majority of cells from a rare population can become a major hurdle. Other techniques used to associate one cell or one nucleus with one barcode are more limited in their throughput, as the number of barcoded cells is then limited by the number of wells in a plaque.^18,19

Barcoding Antibodies for Transcriptomics and Proteomics

The simultaneous measurement of transcription and translation has represented a technological challenge for decades. Recently, new methods introduced the idea of generating proteins tagged with DNA to convert protein abundance and localization into data that can be obtained with NGS technologies.^4,20 This novel use of DNA barcodes brings high throughput to proteomic analyses. The capacity to simultaneously read the proteome and the transcriptome of a cell is of paramount importance; indeed, RNA abundance is not always correlated with protein concentration²¹ due to variations in posttranscriptional processing.²²

The CITE-seq⁴ technique achieves simultaneous proteomic and transcriptomic sequencing using DNA-labeled antibodies ( Fig. 1C ) to tag cell surface proteins. Immunolabeled cells are captured for sequencing, and the short DNA barcodes ligated to antibodies are detected as cDNAs originating from individual cells. The manner in which these short DNA barcodes are attached to the immunoglobulins varies between protocols. In CITE-seq, biotin and streptavidin are used, whereas in REAP-seq²³ the barcode is covalently linked to the antibody to reduce steric hindrance. Ab-seq^20,24 relies on a UMI attached to barcoded antibodies, allowing measurement of the abundance of individual proteins in cells. All these approaches are being rapidly accepted and used in various studies on cell surface proteins such as immune receptors.¹⁸ Barcoded antibodies have been used to develop a qPCR assay that allows the correlation between numbers of transfected plasmids, transcripts, and barcoded proteins to be evaluated in single cells.²⁵ In addition to protein detection, barcoded antibodies are also used to quantify epitopes. Lee et al.²⁶ performed Western blots of cell lysates and used DNA barcoded antibodies to count single molecules. As all antibodies presented an antigen-associated code, precise quantification was possible with specificity comparable to that of enzyme-linked immunosorbent assay (ELISA) plates.

Genshaft et al.²⁷ developed a similar technique that employed proteins coupled to DNA strands that share a short complementary sequence at their 3′ end. When two proteins bind their targets, they co-localize sufficiently to allow DNA barcode hybridization. Each probe serves as a primer for extension of the other. This proximity extension assay (PEA; Fig. 1D ), which requires the tight co-localization of both probes for extension to occur, increases target specificity since the remaining nonspecific probes do not interact in a way that allows proximity extension. Barcodes are read using the C1 platform from Fluidigm to obtain the full sequence of all tagged antibodies.

PEAs have also been used to improve signal quality in fluorescence in situ hybridization (FISH) experiments. The proximity ligation assay for RNA²⁸ (PLAYR) is based on two barcoded DNA probes that hybridize in situ to improve the strength and specificity of the signal. When two of these probes hybridize in contiguous regions, they capture a third barcoded probe that is then circularized. The fact that precise localized hybridization of two different probes is required to capture the barcoded circle template dramatically increases the technique’s specificity. An amplification step is then performed with the circular structure serving as primer. This rolling circle amplification (RCA) of DNA generates a product than contains several repeats of the barcode, thereby generating strong signal amplification. The probed mRNA is converted into a highly repeated barcode compatible with fluorescent and mass cytometry detection. In applying the above technique, Frei et al. used DNA barcoded antibodies and simultaneously detected all barcodes on 14 channels to show strong correlation between RNA and protein localization.²⁸ Technologies based on antibody recognition require prior knowledge of protein expression profiles in cell samples. Indeed, protein levels are critical to an accurate study based on protein–antibody interaction. More straightforward whole-transcriptome sequencing experiments are less sensitive to this, as they consider all available genetic information. Furthermore, the specificity and affinity of antibodies are highly variable and strongly dependent on experimental conditions, rendering these antibody-based techniques largely experimental at present.^20,29

A key practical obstacle for single-cell sequencing is cost, and barcodes have been used to mitigate this by pooling several samples. Barcoded antibodies against ubiquitously expressed proteins with different DNA sequences were employed to tag individual samples.³⁰ Similarly, Nag et al.³¹ profiled 20 single-nucleotide polymorphisms (SNPs) associated with drug resistance in 463 samples of malaria-infected patients in one sequencing round. This approach reduces the cost by a factor of 7 but loses sequencing depth as a trade-off.

An alternative to barcoding antibodies is to use aptamers, which show high specificity for their target molecule. Aptamers consist of RNA, and as such themselves constitute a barcode, that is, alleviate the need for additional barcoding as in the case of antibodies. Aptamers need only be poly-adenylated to ensure their capture in the next-gen RNA-seq workflow.³² They are easy to generate using SELEX^33
–35 and show binding efficiency and specificity at least equal to that of antibodies.^36,37

The generation of new barcoding sequences cannot be completely random and as such is not necessarily straightforward. While available techniques for generating acceptable sequences are efficient, some constraints must be respected. Among these are guanine–cytosine (GC) content, homopolymer length, and certain sequences that must be avoided because of their natural presence in a sample or their recognition by a restriction enzyme.³⁸ These limitations imply that most techniques based on random synthesis of a DNA barcode greatly overestimate the number of useful barcodes that can be generated when groups calculate the theoretical number as an exponential function (4^{number of bases}). Taking into account these considerations, Lyons et al.³⁸ provide a framework for generating billions of acceptable DNA barcodes. Techniques for efficiently tethering a DNA strand to a protein such as an antibody are also being improved.^39,40 Table 1 summarizes key characteristics of each sequencing technique described above.

Table 1.

Comparison of Different DNA-Based Barcoding Techniques for Single-Cell Transcriptomics and Proteomics.

Name of the Method	Theoretical No. of Barcodes	TestedNo. of Barcodes	Processing Speed	Read Depth	Doublet Rate	Capture Rate	Cost
Drop-seq¹	16,777,216	45,000	Thousands per hour	737,000 reads per cell	0.36%–11.3%	12.8%	7 US¢/cell
sciHiC¹⁸	No. of wells in a plate	2000	Not provided	9274	4%	100%	Not provided
sci-RNA-seq¹⁹	Not provided	15,997	Not provided	32,951	1.7%	100%	20 US¢/cell
CITE-seq⁴	1024	13	Same as Drop-seq	Same as Drop-seq	Same as Drop-seq	Same as Drop-seq	Not provided
Ab-seq²⁰	1000	2	Same as Drop-seq	Same as Drop-seq	Same as Drop-seq	Same as Drop-seq	Not provided
REAP-seq²³	65,536	82 antibodies	Same as Drop-seq	20,000	Same as Drop-seq	Same as Drop-seq	Not provided

Barcoding Chemical Libraries for Interaction Screening

High-throughput screening requires the identification of target-interacting molecules from large candidate libraries. This is rendered difficult by the very limited number of channels offered by fluorescence⁴¹ and mass cytometry,⁴² even when they are used simultaneously. In theory, DNA barcodes can be employed to easily generate 10¹⁰ simultaneously usable sequences,⁴³ each of which opens a new experimental channel in which an additional molecule can be observed. This is much more than what can be achieved on fluorescence-activated cell sorting (FACS) platforms.⁴⁴ Moreover, this high number of barcodes has been exploited to screen major histocompatibility complex multimers⁴³ and DNA barcoded chemical libraries, where interacting partners can then be identified by a simple PCR.^45
–47

In a similar manner, Pollock et al.⁴⁸ used phages to carry their barcode. They generated a phage library with each member exposing an antibody fragment (Fab). The DNA sequence encoding the Fab is used as a barcode. They exposed 44 targets to exclusively capture those phages presenting a Fab that had affinity for the displayed targets, whereas other phages were rinsed away. As the captured phage also carried the DNA encoding the Fab, they could identify each interacting Fab by sequencing the phage.

Use of Barcodes for Lineage Studies

Barcodes can be used to identify cellular progeny within an organism during normal development, tumor development, or infectious disease propagation.⁴⁹ Indeed, including a known short DNA barcode sequence into the genome of a cell of interest ensures that it will be transmitted to progeny, allowing subsequent identification of the latter. As the number of divisions increases, mutations in the genome of the cells appear that create subgroups within the population. It is then possible to establish a genealogy tree of the final population of cells. Bacteria were tagged to study the dynamics of propagation of tuberculosis during the infection of a macaque.⁵⁰ The abundance of the subpopulation carrying any given mutation reflects the beneficial effect of said mutation for these bacteria. In yeast, barcodes were used to quantify the evolution of the relative abundance of 500,000 mutants within a single population.⁵¹ This permitted characterization of evolutionary dynamics after the appearance of beneficial mutations. Moreover, barcode-based lineage studies in bacteria can be exploited to characterize the appearance of drug resistance.⁵²

Using a library of lentiviruses,^6,53 a number of short DNA sequences can be integrated into cells within an embryo, with different barcodes encoding different cells. Using NGS, these short sequences can be revealed to deduce cell lineage.⁵³ A modified CRISPR approach based on a homing guide RNA (hgRNA) has also been used to integrate randomly mutating sequences within the genome.^54,55 This method targets nuclease activity to the locus into which the guide RNA is integrated. Therefore, cells can cleave the gRNA locus, which is then repaired in an error-prone manner by nonhomologous end joining, generating a new guide RNA, and at the same time mutating the sequence used as a barcode. Cell phylogeny can then be inferred from the number and localization of mutations.^54,56 Interestingly, given the mutation rate, the inclusion of only six of these self-mutating barcodes would suffice to uniquely identify all neurons from a mouse. These lineage tracking techniques were coupled with whole-cell sequencing workflows to study expression variation during zebrafish development.⁵⁷ In this experiment, CRISPR mutations were not random but rather kept under the control of a heat shock-activated Cas9. CRISPR has also been used in Perturb-seq and CROP-seq to introduce changes in selected genes or promoters and characterize their effect on the whole transcriptome. A library of barcoded guide RNAs was used to infect cells. Each guide allowed perturbation of the expression of one gene, which was identified with the barcode carried by the guide RNA.^58
–60 Barcodes have also been integrated within viral genomes to track their lineage.⁶¹ Barcoding viruses with 34 bp DNA sequences allowed quantification of viral subgroups and calculation of the reactivation frequencies of the viruses posttreatment.

Finally, color barcodes have also been used for lineage tracing based on Cre-recombinase activity in Brainbow.⁶² Cre is able to excise or invert short DNA sequences that are flanked by specifically recognized regions (lox regions). Therefore, infecting cells with distinct fluorescent proteins encoded in a single locus, each flanked by incompatible sets of lox regions, allows the random induction of one of the fluorescent proteins in the cell ( Fig. 1E ).⁶² Cre stochastic recombination has been used in very similar ways by various techniques such as BOINC⁶³ and MultiBow.⁶⁴

Barcoding Spatial Information for Next-Generation Sequencing

One critical piece of information that can be barcoded, which is otherwise lost in most NGS protocols, is the spatial origin of cells. TIVA¹² allows the individual selection of cells within a live microscopy image to be sequenced. To attain such precision, Lovatt et al. designed a TIVA tag that enters cells and requires photoactivation to hybridize on polyadenylated mRNA. This tag is biotinylated, which allows downstream extraction of the mRNAs of interest with streptavidin beads. Even though this technique does not reach the read depth of sc-RNA-seq, sequenced cells can be chosen, one at a time, and therefore cellular proximity and contact interactions can be studied.⁵ Another method, termed CLaP,¹³ allows pairing the information generated by single-cell sequencing protocols to individual cells in a microscopy image. It uses photobleaching to attach biotin to the membranes of cells that can be chosen based on visible criteria such as shape, migration speed and direction, cell-to-cell contact, or even a characteristic fluorescent signal present within the cell. The biotin can then be targeted with a fluorescent streptavidin. Color-tagged cells can be recognized on a Fluidigm C1 sorting chip by epifluorescence imaging, and the whole transcriptome of spatially chosen cells can be evaluated with the typical read depth of NGS techniques.

The other approach for tracking the spatial origin of an mRNA is in situ sequencing, which has the unique capacity to reveal transcript location at the subcellular level. Knowing where transcripts are translated could prove very useful toward understanding functional relationships between genes.⁵ Barcodes can be used to mitigate the major drawback of this approach, that is, the limited number of genes that can be simultaneously observed. Barcoding of “padlock probes” is used to increase the number of sequences that can be simultaneously analyzed.⁶⁵ Briefly, two 20 bp DNA probe sequences separated by a 50 bp linker are hybridized with a cDNA target in situ, which, after ligation, creates a circular-shaped padlock probe. Ke et al.⁶⁶ exploited this approach for in situ target sequencing using a known barcode included in the linker region of the padlock probe. In addition to the signal amplification that rolling circle products provide, these are also well adapted to in situ sequencing since they remain bound to the target sequence. Each product can be locally interrogated using sequencing by ligation. In their work, Ke et al.⁶⁶ encoded probes with 4 bp long barcodes, generating 256 combinations. They used these to locate 31 known transcripts in a breast cancer tissue section. Genes were detected with 98.6% efficiency, with a maximum of 90 reads per cell. This maximal limit is due to the fact that sequencing by ligation is based on imaging, and therefore requires sufficient spacing for the sequenced strands to be discriminated in the image. This is a very powerful method to detect and localize RNAs of known sequence, and Larsson et al.⁶⁷ used a similar approach to locate DNA molecules.

FISH probes can also be spectrally encoded and then detected by super-resolution microscopy.^7,68 Lubeck et al.⁷ simultaneously identified up to 32 different barcodes using three fluorophores. In this system, the code is composed of intensity levels for each of the three color channels used to encode the probes. Super-resolution microscopy allows sufficient resolution to fluorescently encode, detect, and localize all transcripts associated with a single gene.⁶⁹

Color Barcoding of Probes

The number of possible colored probes that can be simultaneously used is restricted, since only a limited number of wavelengths can be detected without spectral crosstalk. To overcome this, several techniques are based on beads that each carry a signal in several color channels. The ratio of intensities in the different detection channels within a bead creates a barcode. Nguyen et al.¹⁰ used ratiometric loading of gel beads with five lanthanide nanophosphors. These have the advantage of being excited by the same wavelength, do not photobleach, and have narrow emission bands. Different combinations of loading ratios provided 1101 codes. These beads can be assigned an affinity for a biological receptor by coating them with a probe to use as an alternative to fluorescent antibodies. In a similar approach, Tang et al.⁷⁰ stained nematodes with beads loaded with a BODIPY fluorophore flanked by two oxazines. The oxazines can be cleaved by simple light excitation, which shifts the fluorescence of the compound to higher wavelengths. The use of different activation times changes the signal ratios between the three emission wavelengths of the compound, as longer illumination increased the ratio of molecules that had their oxazine cleaved, therefore shifting their fluorescence toward longer wavelength. Different regions of the worm were efficiently encoded by simply varying the activation time along its anteroposterior axis. In a similar approach, Han et al.⁷¹ developed microbeads loaded with quantum dots that allow excitation of all channels with a single wavelength. In this protocol, the code comprises 10 intensity levels in six color channels. The gel beads (approximately 1.2 µm diameter) can be loaded with different numbers of quantum dots and conjugated with DNA capture probes.

Alternatively, DNA has been used as carrier of fluorescent dyes for relative intensity barcoding.⁹ Here, the fluorescent molecules are carried by a DNA dendrimer that constitutes a code-carrying microstructure of reduced size, thereby improving usability. Two-color encoding of DNA probes has also been used to increase the number of targets simultaneously detected by FISH.⁷²

Another key approach to color coding involves spatially organizing fluorescent molecules on a carrier. This carrier can be a gel bead, within which a barcode can be drawn by photobleaching.⁷³ Also, a DNA strand can be used as a carrier on which a sequence of color-tagged RNA hybridizes, creating a colored sequence. This technique, termed nCounter, was used to count mRNA molecules of more than 500 genes and shows high sensitivity without amplification.⁷⁴ Each DNA strand is made of a capture sequence specific for the target mRNA, and for a backbone on which colored RNA will hybridize. Using an electric field, all DNA backbones can be aligned in the same direction. Imaging then reveals the color sequence associated with each capture backbone, as well as their number.

On a similar note, DNA origami have been employed⁷⁵ to accomplish the same barcoding without requiring application of an electric field, allowing use with live samples. In this approach, the DNA-PAINT structure is employed to spatially organize colored probes into as many as 216 barcodes. These probes are used to stain live yeast, and super-resolution microscopy allows the spatial detection of up to 823,543 codes. In addition to not requiring alignment with an electric field, these probes have the key advantage of being significantly shorter (400–800 nm) than nCounter probes (2 µm). Another approach uses structured metallic particles to create a reflected pattern that can be encoded. The advantage here is that all fluorescence channels are left available for more classical stainings.⁷⁶

In addition to these approaches using ratios and positions to create codes, Hu et al.⁷⁷ set out to improve the library of available molecules for spectral encoding. They developed a library of polyynes to establish 20 simultaneously detectable light frequencies. These polyynes can be used to tag any protein and detect three states using Raman spectroscopy: absent, low concentration, and high concentration. With this, a theoretical maximal number of barcodes of 59,048 was attained, the highest number for any optical technique to date.

Fixed tissue samples have been stained with up to 66 different antibodies barcoded with DNA and revealed with fluorescence microscopy in a technique termed CO-detection by inDEXing (CODEX).⁷⁸ Each antibody type is associated with a specific DNA oligo that has a common sequence for a complementary primer, a distinct length, and a very particular design. Antibodies are identified in pairs during the extension of the complementary strands of their respective DNA barcode using standard fluorescence microscopy. A mix of fluorescently labeled U (green) and C (red) bases is added to the sample to only reveal the two antibodies with sequences having A or G as a first base after the primer during the first imaging cycle. After each image, fluorophores are cleaved, the excess of DNA bases is removed, and either A or G is added to the polymerizing strand to select the next unique pair of oligos that will fluoresce in each imaging cycle ( Fig. 1G ). Using this approach, Goltsev et al.⁷⁸ managed to perform 36 imaging cycles with good signal-to-noise ratio to detect 31 proteins.

Finally, the use of antibodies tagged with distinct elemental isotopes (mostly metals) offers a comparable number of tags, as available panels consist of close to 40 markers.⁷⁹ In this approach, cells are vaporized by a plasma into a cloud of elemental heavy ions, originating from tagged antibodies. Time-of-flight measurements allow the identification of each element present in the volatized material and their proportion. Mass cytometry can be used either in a configuration where single cells are directed to the plasma one by one, as in a FACS experiment, or with paraffin-embedded tissue sections, thereby also preserving spatial information.^80,81 Each different element bound to an antibody thus behaves as a barcode, and the total number of possible codes is limited by the availability of pure isotopes that can be attached to these proteins. A technology that allows a combination of several isotopes on one antibody to create multiple codes has not been developed to date.

Conclusion

In this review, we have described many uses for barcodes to identify a variety of objects, from molecules to cells, or even samples. Barcoding offers solutions to many practical problems, including reducing research-associated costs. Moreover, the ability to multiplex allows correlations to be established between biological phenomena in a single run, that is, obviating the need for separate experiments. From a more academic point of view, although further extensive research is required, barcoding harbors great promise for encoding spatial information and for providing revolutionary methods of precise molecular quantification.

Two main tools are being investigated to barcode information: the use of synthetic DNA sequences and fluorescence. Although the former requires sequencing, and therefore sample destruction, DNA tags provide higher numbers of possible combinations, and hence more channels that can be simultaneously studied.

Several avenues remain to be explored more deeply. First, even though historically linked with barcode generation, fluorescence suffers from limitations in detection sensitivity. To address this, dyes with sharper excitation/emission need to be synthesized. Also, using colored microstructures to create color sequences that mimic DNA sequences greatly increases the number of barcodes that can be generated with the colors that can already be discriminated with the present technology. A complementary option is to use relative intensities of different dyes carried by these same microstructures. To further empower these two approaches, barcoded microstructures need to be miniaturized to permit their use in biological samples. Additionally, many groups are focused on creating new techniques to place barcodes on the target cell or structure, such as split and pool encoding, DNA ligation, and antibody or microstructure conjugation, each with its own advantages. More work on these approaches should generate new opportunities.

Finally, despite what may be popularly believed, single-cell sequencing techniques only provide means to explore the transcriptome of thousands of single cells, and indeed determining the sequence of a specific single cell chosen in its environment remains a challenge. A minor number of approaches are currently tackling this limitation and, once perfected, hold great promise for addressing long-standing biological questions where one cell is responsible for great changes, such as in organism development, tumor progression, or immunity.

Footnotes

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by grants from the Natural Science and Engineering Research Council of Canada and Genome Canada/Genome Quebec to SC. SC holds a salary awards from the Fonds de Recherche du Québec–Santé.

ORCID iD

Santiago Costantino

References

Macosko

E. Z.

Basu

Satija

et al . Highly Parallel Genome-Wide Expression Profiling of Individual Cells Using Nanoliter Droplets. Cell 2015, 161, 1202–1214.

Smith

A. M.

Heisler

L. E.

St. Onge

R. P.

et al . Highly-Multiplexed Barcode Sequencing: An Efficient Method for parallel Analysis of Pooled Samples. Nucleic Acids Res. 2010, 38, e142.

Wang

et al . Pair-Barcode High-Throughput Sequencing for Large-Scale Multiplexed Sample Analysis. BMC Genomics 2012, 13, 43.

Stoeckius

Hafemeister

Stephenson

et al . Simultaneous Epitope and Transcriptome Measurement in Single Cells. Nat. Methods 2017, 14, 865–868.

Avital

Hashimshony

Yanai

Seeing Is Believing: New Methods for In Situ Single-Cell Transcriptomics. Genome Biol. 2014, 15, 110.

Porter

S. N.

Baker

L. C.

Mittelman

et al . Lentiviral and Targeted Cellular Barcoding Reveals Ongoing Clonal Dynamics of Cell Lines In Vitro and In Vivo. Genome Biol. 2014, 15, R75.

Lubeck

Cai

Single-Cell Systems Biology by Super-Resolution Imaging and Combinatorial Labeling. Nat. Methods 2012, 9, 743–748.

Lee

Kim

et al . Colour-Barcoded Magnetic Microparticles for Multiplexed Bioassays. Nat. Mater. 2010, 9, 745–749.

Y. T.

Luo

Multiplexed Detection of Pathogen DNA with DNA-Based Fluorescence Nanobarcodes. Nat. Biotechnol. 2005, 23, 885–889.

10.

Nguyen

H. Q.

Baxter

B. C.

Brower

et al . Programmable Microfluidic Synthesis of Over One Thousand Uniquely Identifiable Spectral Codes. Adv. Opt. Mater. 2017, 5, 1600548.

11.

Braeckmans

De Smedt

S. C.

Leblans

et al . Encoding Microcarriers: Present and Future Technologies. Nat. Rev. Drug Discov. 2002, 1, 447–456.

12.

Lovatt

Ruble

B. K.

Lee

et al . Transcriptome In Vivo Analysis (TIVA) of Spatially Defined Single Cells in Live Tissue. Nat. Methods 2014, 11, 190–196.

13.

Binan

Mazzaferri

Choquet

et al . Live Single-Cell Laser Tag. Nat. Commun. 2016, 7, 11636.

14.

Zheng

G. X.

Terry

J. M.

Belgrader

et al . Massively Parallel Digital Transcriptional Profiling of Single Cells. Nat. Commun. 2017, 8, 14049.

15.

Stoeckius

Hafemeister

Stephenson

et al . Simultaneous Epitope and Transcriptome Measurement in Single Cells. Nat. Methods 2017, 14, 865–868.

16.

Trombetta

J. J.

Gennert

et al . Preparation of Single-Cell RNA-Seq Libraries for Next Generation Sequencing. Curr. Protoc. Mol. Biol. 2014, 107, 4.22.1–4.22.17.

17.

Joensson

H. N.

Andersson Svahn

Droplet Microfluidics—A Tool for Single-Cell Analysis. Angew. Chem. Int. Ed. Engl. 2012, 51, 12176–12192.

18.

Ramani

Deng

Qiu

et al . Massively Multiplex Single-Cell Hi-C. Nat. Methods 2017, 14, 263–266.

19.

Cao

Packer

J. S.

Ramani

et al . Comprehensive Single-Cell Transcriptional Profiling of a Multicellular Organism. Science 2017, 357, 661–667.

20.

Shahi

Kim

S. C.

Haliburton

J. R.

et al . Abseq: Ultrahigh-Throughput Single Cell Protein Profiling with Droplet Microfluidic Barcoding. Sci. Rep. 2017, 7, 44447.

21.

Pascal

L. E.

True

L. D.

Campbell

D. S.

et al . Correlation of mRNA and Protein Levels: Cell Type-Specific Gene Expression of Cluster Designation Antigens in the Prostate. BMC Genomics 2008, 9, 246.

22.

Liu

Beyer

Aebersold

On the Dependency of Cellular Protein Levels on mRNA Abundance. Cell 2016, 165, 535–550.

23.

Peterson

V. M.

Zhang

K. X.

Kumar

et al . Multiplexed Quantification of Proteins and Transcripts in Single Cells. Nat. Biotechnol. 2017, 35, 936–939.

24.

Baron

Yanai

New Skin for the Old RNA-Seq Ceremony: The Age of Single-Cell Multi-Omics. Genome Biol. 2017, 18, 159.

25.

Stahlberg

Thomsen

Ruff

et al . Quantitative PCR Analysis of DNA, RNAs, and Proteins in the Same Single Cell. Clin. Chem. 2012, 58, 1682–1691.

26.

Lee

Geiss

G. K.

Demirkan

et al . Implementation of a Multiplex and Quantitative Proteomics Platform for Assessing Protein Lysates Using DNA-Barcoded Antibodies. Mol. Cell. Proteomics 2018.

27.

Genshaft

A. S.

Gallant

C. J.

et al . Multiplexed, Targeted Profiling of Single-Cell Proteomes and Transcriptomes in a Single Reaction. Genome Biol. 2016, 17, 188.

28.

Frei

A. P.

Bava

F. A.

Zunder

E. R.

et al . Highly Multiplexed Simultaneous Detection of RNAs and Proteins in Single Cells. Nat. Methods 2016, 13, 269–275.

29.

Bendall

S. C.

Nolan

G. P.

Roederer

et al . A Deep Profiler’s Guide to Cytometry. Trends Immunol. 2012, 33, 323–332.

30.

Stoeckius

Zheng

Houck-Loomis

et al . Cell Hashing with Barcoded Antibodies Enables Multiplexing and Doublet Detection for Single Cell Genomics. Genome Biol. 2018, 19, 224.

31.

Nag

Dalgaard

M. D.

Kofoed

P. E.

et al . High Throughput Resistance Profiling of Plasmodium falciparum Infections Based on Custom Dual Indexing and Illumina Next Generation Sequencing-Technology. Sci. Rep. 2017, 7, 2398.

32.

Delley

C. L.

Liu

Sarhan

M. F.

et al . Combined Aptamer and Transcriptome Sequencing of Single Cells. Sci. Rep. 2018, 8, 2919.

33.

Ellington

A. D.

Szostak

J. W.

In Vitro Selection of RNA Molecules That Bind Specific Ligands. Nature 1990, 346, 818–822.

34.

Delac

Motaln

Ulrich

et al . Aptamer for Imaging and Therapeutic Targeting of Brain Tumor Glioblastoma. Cytometry A 2015, 87, 806–816.

35.

Dunn

M. R.

Jimenez

R. M.

Chaput

J. C.

Analysis of Aptamer Discovery and Technology. Nat. Rev. Chem. 2017, 1, 0076.

36.

Wang

Yang

et al . Multiparameter Particle Display (MPPD): A Quantitative Screening Method for the Discovery of Highly Specific Aptamers. Angew. Chem. Int. Ed. Engl. 2017, 56, 744–747.

37.

Chen

Yang

Replacing Antibodies with Aptamers in Lateral Flow Immunoassay. Biosens. Bioelectron. 2015, 71, 230–242.

38.

Lyons

Sheridan

Tremmel

et al . Large-Scale DNA Barcode Library Generation for Biomolecule Identification in High-Throughput Screens. Sci. Rep. 2017, 7, 13899.

39.

Carvalho

A. M.

Manicardi

Montes

C. V.

et al . Decoration of Trastuzumab with Short Oligonucleotides: Synthesis and Detailed Characterization. Org. Biomol. Chem. 2017, 15, 8923–8928.

40.

Lovendahl

K. N.

Hayward

A. N.

Gordon

W. R.

Sequence-Directed Covalent Protein-DNA Linkages in a Single Step Using HUH-Tags. J. Am. Chem. Soc. 2017, 139, 7030–7035.

41.

Grecco

H. E.

Imtiaz

Zamir

Multiplexed Imaging of Intracellular Protein Networks. Cytometry A 2016, 89, 761–775.

42.

Cheng

Newell

E. W.

Deep Profiling Human T Cell Heterogeneity by Mass Cytometry. Adv. Immunol. 2016, 131, 101–134.

43.

Bentzen

A. K.

Marquard

A. M.

Lyngaa

et al . Large-Scale Detection of Antigen-Specific T Cells Using Peptide-MHC-I Multimers Labeled with DNA Barcodes. Nat. Biotechnol. 2016, 34, 1037–1045.

44.

Chan

B. M.

Schow

P. W.

et al . High-Throughput Screening of Hybridoma Supernatants Using Multiplexed Fluorescent Cell Barcoding on Live Cells. J. Immunol. Methods 2017, 451, 20–27.

45.

Zimmermann

Neri

DNA-Encoded Chemical Libraries: Foundations and Applications in Lead Discovery. Drug Discov. Today 2016, 21, 1828–1834.

46.

Franzini

R. M.

Neri

Scheuermann

DNA-Encoded Chemical Libraries: Advancing Beyond Conventional Small-Molecule Libraries. Acc. Chem. Res. 2014, 47, 1247–1255.

47.

Yachie

Petsalaki

Mellor

J. C.

et al . Pooled-Matrix Protein Interaction Screens Using Barcode Fusion Genetics. Mol. Syst. Biol. 2016, 12, 863.

48.

Pollock

S. B.

Mou

et al . Highly Multiplexed and Quantitative Cell-Surface Protein Profiling Using Genetically Barcoded Antibodies. Proc. Natl. Acad. Sci. U.S.A. 2018, 115, 2836–2841.

49.

Blundell

J. R.

Levy

S. F.

Beyond Genome Sequencing: Lineage Tracking with Barcodes to Study the Dynamics of Evolution, Infection, and Cancer. Genomics 2014, 104, 417–430.

50.

Martin

C. J.

Cadena

A. M.

Leung

V. W.

et al . Digitally Barcoding Mycobacterium tuberculosis Reveals In Vivo Infection Dynamics in the Macaque Model of Tuberculosis. MBio 2017, 8, e00312–e00317.

51.

Levy

S. F.

Blundell

J. R.

Venkataram

et al . Quantitative Evolutionary Dynamics Using High-Resolution Lineage Tracking. Nature 2015, 519, 181–186.

52.

Gresham

Evolution: Fitness Tracking for Adapting Populations. Nature 2015, 519, 164–165.

53.

Woodworth

M. B.

Girskis

K. M.

Walsh

C. A.

Building a Lineage from Single Cells: Genetic Techniques for Cell Lineage Tracking. Nat. Rev. Genet. 2017, 18, 230–244.

54.

Kalhor

Mali

Church

G. M.

Rapidly Evolving Homing CRISPR Barcodes. Nat. Methods 2017, 14, 195–200.

55.

Schmierer

Botla

S. K.

Zhang

et al . CRISPR/Cas9 Screening Using Unique Molecular Identifiers. Mol. Syst. Biol. 2017, 13, 945.

56.

McKenna

Findlay

G. M.

Gagnon

J. A.

et al . Whole-Organism Lineage Tracing by Combinatorial and Cumulative Genome Editing. Science 2016, 353, aaf7907.

57.

Raj

Wagner

D. E.

McKenna

et al . Simultaneous Single-Cell Profiling of Lineages and Cell Types in the Vertebrate Brain. Nat. Biotechnol. 2018, 36, 442–450.

58.

Adamson

Norman

T. M.

Jost

et al . A Multiplexed Single-Cell CRISPR Screening Platform Enables Systematic Dissection of the Unfolded Protein Response. Cell 2016, 167, 1867–1882.e21.

59.

Dixit

Parnas

et al . Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens. Cell 2016, 167, 1853–1866.e17.

60.

Datlinger

Rendeiro

A. F.

Schmidl

et al . Pooled CRISPR Screening with Single-Cell Transcriptome Readout. Nat. Methods 2017, 14, 297–301.

61.

Fennessey

C. M.

Pinkevych

Immonen

T. T.

et al . Genetically-Barcoded SIV Facilitates Enumeration of Rebound Variants and Estimation of Reactivation Rates in Nonhuman Primates Following Interruption of Suppressive Antiretroviral Therapy. PLoS Pathog 2017, 13, e1006359.

62.

Livet

Weissman

T. A.

Kang

et al . Transgenic Strategies for Combinatorial Expression of Fluorescent Proteins in the Nervous System. Nature 2007, 450, 56–62.

63.

Zador

A. M.

Dubnau

Oyibo

H. K.

et al . Sequencing the Connectome. PLoS Biol 2012, 10, e1001411.

64.

Xiong

Obholzer

N. D.

Noche

R. R.

et al . Multibow: Digital Spectral Barcodes for Cell Tracing. PLoS One 2015, 10, e0127822.

65.

Nilsson

Malmgren

Samiotaki

et al . Padlock Probes: Circularizing Oligonucleotides for Localized DNA Detection. Science 1994, 265, 2085–2088.

66.

Mignardi

Pacureanu

et al . In Situ Sequencing for RNA Analysis in Preserved Tissue and Cells. Nat. Methods 2013, 10, 857–860.

67.

Larsson

Koch

Nygren

et al . In Situ Genotyping Individual DNA Molecules by Target-Primed Rolling-Circle Amplification of Padlock Probes. Nat. Methods 2004, 1, 227–232.

68.

Lubeck

Coskun

A. F.

Zhiyentayev

et al . Single Cell In Situ RNA Profiling by Sequential Hybridization. Nat. Methods 2014, 11, 360–361.

69.

Cai

Turning Single Cells into Microarrays by Super-Resolution Barcoding. Brief. Funct. Genomics 2013, 12, 75–80.

70.

Tang

Zhang

Dhakal

et al . Photochemical Barcodes. J. Am. Chem. Soc. 2018, 140, 4485–4488.

71.

Han

Gao

J. Z.

et al . Quantum-Dot-Tagged Microbeads for Multiplexed Optical Coding of Biomolecules. Nat. Biotechnol. 2001, 19, 631–635.

72.

Levsky

J. M.

Shenoy

S. M.

Pezo

R. C.

et al . Single-Cell Gene Expression Profiling. Science 2002, 297, 836.

73.

Braeckmans

De Smedt

S. C.

Roelant

et al . Encoding Microcarriers by Spatial Selective Photobleaching. Nat. Mater. 2003, 2, 169–173.

74.

Geiss

G. K.

Bumgarner

R. E.

Birditt

et al . Direct Multiplexed Measurement of Gene Expression with Color-Coded Probe Pairs. Nat. Biotechnol. 2008, 26, 317–325.

75.

Lin

Jungmann

Leifer

A. M.

et al . Submicrometre Geometrically Encoded Fluorescent Barcodes Self-Assembled from DNA. Nat. Chem. 2012, 4, 832–839.

76.

Nicewarner-Pena

S. R.

Freeman

R. G.

Reiss

B. D.

et al . Submicrometer Metallic Barcodes. Science 2001, 294, 137–141.

77.

Zeng

Long

et al . Supermultiplexed Optical Imaging and Barcoding with Engineered Polyynes. Nat. Methods 2018, 15, 194–200.

78.

Goltsev

Samusik

Kennedy-Darling

et al . Deep Profiling of Mouse Splenic Architecture with CODEX Multiplexed Imaging. Cell 2018, 174, 968–981.e15.

79.

Bendall

S. C.

Simonds

E. F.

Qiu

et al . Single-Cell Mass Cytometry of Differential Immune and Drug Responses across a Human Hematopoietic Continuum. Science 2011, 332, 687–696.

80.

Angelo

Bendall

S. C.

Finck

et al . Multiplexed Ion Beam Imaging of Human Breast Tumors. Nat. Med. 2014, 20, 436–442.

81.

Keren

Bosse

Marquez

et al . A Structured Tumor-Immune Microenvironment in Triple Negative Breast Cancer Revealed by Multiplexed Ion Beam Imaging. Cell 2018, 174, 1373–1387.e19.