Abstract
The traditional route to investigating biology by perturbing living systems or by individually purifying and characterizing component parts is giving way to more complex endeavors where chemists and physicists attempt to build cells from scratch. Parallel efforts are underway that either exploits extant biological parts or prebiotically plausible molecules. Both approaches help to reveal the underlying physical–chemical forces that give rise to cellular function and highlight the important role played by polymers in regulating biological chemical systems. Although the success in RNA and lipid chemistry has led to the reconstitution of specific facets of cellular life, our understanding of dynamic, dissipative networks is currently too incomplete to allow for the construction of a self-sustained, integrated protocell. However, the presence of shared chemistry points to a promising path forward.
Impact statement
Advances in the understanding of the biophysics of membranes, the nonenzymatic and enzymatic polymerization of RNA, and in the design of complex chemical reaction networks have led to a new, integrated way of viewing the shared chemistry needed to sustain life. Although a protocell capable of Darwinian evolution has yet to be built, the seemingly disparate pieces are beginning to fit together. At the very least, better cellular mimics are on the horizon that will likely teach us much about the physicochemical underpinnings of cellular life.
Keywords
Introduction
Recent years have brought a renewed enthusiasm to attempts at building a cell from component parts. There are laboratories that are attempting to decipher the prebiotic chemistry that led to life, to gain insight into the divide between complex chemical reactions and biological networks, and to engineer life-like technologies. The diversity in terminology used to describe the assembled chemical systems reflects the variety of approaches taken. The terms protocell, artificial cell, minimal cell, and synthetic cell are at times used to describe the same thing, and other times intended to emphasize different aspects of the assembled chemical system. For example, protocell is frequently used to describe simple cell-like units assembled from prebiotically plausible components, whereas artificial cells often indicate mimics of modern cells that consist, in part, of extant biological machinery. However, examples to the contrary are commonly encountered.
The most straightforward approach to synthesizing a model protocell (Figure 1) is to focus on developing a self-assembled, minimal unit capable of Darwinian evolution. 1 As extant biology is the product of billions of years of evolution, the mechanisms exploited by living cells are quite complex in comparison to the physicochemical laws that drive the system. Therefore, in addition to being of greater prebiotic relevance, mechanisms that do not rely on biological machinery are at times easier to implement. However, the difficulty is that abiotic mechanisms proceed unregulated, which is in stark contrast to how enzymatically controlled the biological networks function. 2 Moreover, the dynamics of the abiotic mechanisms that current models of protocells depend on for survival tend to decrease the robustness of the overall system. Therefore, one is often left trying to balance chemical spontaneity with robustness so that the protocell can persist through fluctuating conditions. Presumably at some point, the complexity of the system would reach a level where abiotically implausible polymers would be required to coordinate the activities inside the protocell. We have yet to determine where that boundary lies.

Schematic of one conception of a hypothetical protocell. A chemical reaction network within a lipid vesicle acquires nutrients that lead to the copying of nucleic acids and the synthesis of peptides and lipids. Here, the internal chemistry of the protocell supports membrane growth by synthesizing lipid. Membrane growth then leads to the division of unstable intermediate structures through environmental shear forces. Reactive monomeric building blocks and reaction by-products, i.e. waste, passively diffuse across the membrane. Ideally, an internal protometabolic network would sustain the protocell across many generations.
Traditionally, protocells are envisioned as short-chain fatty acid vesicles containing catalytic RNA molecules. This implies that the RNA molecules are tied in some way to the maintenance of the vesicle. Although the presence of nucleic acids does affect the dynamics of some encapsulated chemical systems, as discussed below, strands of nucleic acids alone do not constitute a functioning genome any more than polymers of amino acids or sugars. Some type of supporting architecture is necessary to implement the instructions encoded within the nucleic acid polymers in a way that allows the (proto)cell to defy thermodynamic equilibrium long enough to propagate. To do so, early cell-like systems presumably made use of a rudimentary form of metabolism. It seems likely that these early protocells contained a heterogenous mixture of molecules, including RNA, peptides, metal ions, lipids, plus additional components. Although life could have begun with molecules that are no longer exploited by living cells today, 3 emphasis is usually placed on molecules that have a connection to extant living systems. For example, coacervate-based and proteinosome-based cell-like systems4–7 have been constructed, but lipid-based systems have been more extensively investigated partly because lipid vesicles better resemble life as we know it.
Compartmentalization
Like all of the components of a cell, membranes are dynamic. Membranes must change shape during growth and division, mediate the uptake of nutrients and release of waste, and are integral to the maintenance of concentration gradients that are extensively used to drive thermodynamically unfavorable reactions. Many of these characteristics have been reconstituted from extant sources for the synthesis of artificial cells. For example, much of the protein division machinery has been assembled in vitro,8–12 although the robust division of lipid vesicles has yet to be demonstrated. Similarly, protein pores, 13 ion pumps, 14 and ATP synthase15–17 have been reconstituted in lipid vesicles. 18 Importantly, some protein-mediated activities have been coupled together so that the light-dependent pumping of protons can be used to drive the synthesis of ATP.17,19 However, significant challenges remain in building in the laboratory a biological-like membrane that is responsive to the changing needs of a cell. 8 This difficulty is mostly due to the fact that membranes made from biological lipids are dynamically restricted, kinetically trapped structures, which is compensated for in biology by the incorporation of up to half protein in membrane composition. Proteins are needed to modulate the properties of the membrane, and the production and degradation of the proteins themselves are regulated at a genetic level. The construction of such a complex genetic and protein-dependent system is beyond what has been reconstituted to date with cell-free synthetic biology methodologies.
It is, therefore, perhaps unsurprising that more progress has been made with vesicles composed of model prebiotic lipids than with the phospholipids found in biological membranes (Figure 2(a)). Prebiotic lipids are typically taken to be monoacyl amphiphiles with either a carboxylate, i.e. fatty acid, or alcohol head group. Such lipids can be synthesized by spark discharge 20 or Fischer–Tropsch reactions at high temperature with metal catalysts,21,22 are found in carbonaceous meteorites, 23 and can form vesicles with a lipid bilayer morphologically similar to membranes made of biological phospholipids.24,25 Recent work suggests that more biologically relevant phospholipids could have formed on the prebiotic Earth;26–28 however, corroborating evidence from carbonaceous meteorites is not available. The supramolecular structures formed from fatty acids are less stable than the aggregates formed by their diacyl counterparts, 29 exhibiting a much greater range of dynamics. The increased dynamics allow for activity that otherwise would require the intervention of proteins. Lipids can twist, turn, laterally diffuse, flip between leaflets, and escape into solution (Figure 2(b)). All of these phenomena are several orders of magnitude faster with fatty acids than with diacyl phospholipids.29–32 Much of the differences in lipid dynamics can be explained by considering the packing of the hydrophobic regions of the membrane. Increased packing leads to greater van der Waals interactions, and thus decreased dynamics. The dynamic nature of fatty acids allows growth, 30 division, 33 and the acquisition of nutrients,34–36 all in the absence of proteins. Importantly, heterogenous mixtures of monoacyl lipids, such as those that would likely arise from non-enzymatic, prebiotic synthesis, give vesicles more amenable to a protocellular life cycle. 37 Heterogeneity in the length of the hydrocarbon chain promotes the self-assembly of vesicles.38,39 Moreover, heterogeneity of polar head-groups can lead to increased stability40–44 and permeability.35,36

The lipids and lipid dynamics of protocellular membranes. (a) From top to bottom, the structures of model prebiotic lipids are decanoic acid, decanol, the glycerol monoester of decanoate, and cyclic-lyso-phosphodecanoic acid. The last structure is of di-decanoyl phosphatidic acid, shown for reference. (b) Above the critical aggregate concentration, fatty acids exist in equilibrium between free monomers, micelles (green), and various lipid aggregates (purple), including vesicles with bilayer membranes. Lipid monomers can be exchanged between different aggregate structures, flip between leaflets (blue), and laterally diffuse (yellow). The dynamic nature of the bilayer may lead to the formation of transient pores. Micelles can incorporate into pre-existing vesicles, leading to a net growth in volume and surface and surface area (green). Likewise, the presence of diacyl phospholipids (brown) leads to vesicle growth through the net accumulation of fatty acid (orange) by decreasing the desorption rate of fatty acids from the membrane.

Transition towards a functional protocell. A subset of complex prebiotic chemistry gave rise to protocells. Here, the process is envisaged to have proceeded in a fashion where nucleic acids were deeply integrated with peptides, and metabolic-like chemistry to generate a cell-like network from the beginning. That is, systems did not generally evolve separately and later merge. Iterative cycles of growth, fusion, and division ultimately led to competition between protocells sustained by some type of dissipative chemical (proto)metabolism.
Lipid dynamics also enable competition between fatty acid-based vesicles in a way that would be difficult to achieve with diacyl phospholipids alone. Osmotic pressure generated by plausible levels of encapsulated RNA, for example, drives vesicle growth at the expense of empty vesicles. 45 Lipid exchange between the two populations of vesicles, i.e. RNA containing and empty vesicles, occurs because the energetic barrier for fatty acids to enter aqueous solution from a membrane is low. 29 Therefore, the unfavorable surface area to volume ratio of osmotically swollen vesicles can be reduced through the incorporation of fatty acids that are in equilibrium between different membranes. The implication then is that the activity of an encapsulated RNA polymerase would induce growth by increasing the concentration of RNA and thus the osmotic pressure on the vesicle. Such a mechanism could also lead to the division of multilamellar vesicles under environmentally reasonable conditions, since the shape changes that arise from concentration gradients across the membranes lead to the formation of fragile thread-like structures that break apart into daughter vesicles upon agitation.33,46,47
Diacyl phospholipids are kinetically trapped in vesicle membranes 29 and thus cannot contribute to the alleviation of osmotic stress in the way that fatty acids can. However, the presence of even small amounts of diacyl phospholipid significantly impacts the dynamics of fatty acids.48–50 The slower desorption rates of fatty acids from membranes containing diacyl phospholipids skew the exchange of fatty acids between different membranes.32,51–53 These altered kinetics, plus entropic effects, result in the net growth of diacyl phospholipid containing vesicles. Therefore, multiple mechanisms, including the replication of nucleic acids and the synthesis of diacyl lipids, exist that could give rise to competition between protocells. 50 Since cholesterol increases the packing of membranes and decreases the permeability of membranes to solutes,34,54 it would be insightful to determine if polyaromatic hydrocarbons (PAH), which have been hypothesized to play a role in the origins of life,38,55,56 similarly impact the lipid exchange kinetics between vesicles.
Increased lipid dynamics are not without disadvantages. Fatty acid vesicles only form near the pKa of the carboxylate head group 57 and are similarly sensitive to the ionic strength of the solution, 58 in particular to the presence of divalent cations. 59 Intuitively then, the addition of diacyl phospholipid would be expected to increase the stability of fatty acid vesicles. In fact, only 10 mol% of diacyl phospholipid decreases the critical aggregate concentration (CAC) of oleate vesicles48,50 and increases the stability of vesicles to Mg2+. 48 Stability to Mg2+ is important as many chemical mechanisms, such as the polymerization of nucleic acids, 49 are facilitated by the neutralizing and catalytic effects of Mg2+. Taken together, the data suggest that if a prebiotic catalyst emerged that enabled the synthesis of diacyl lipids, then vesicles containing such a catalyst would have a selective advantage due to at least two factors. Such vesicles would display increased stability to the effects of divalent cations and would be capable of preferential growth over vesicles lacking diacyl lipids, as described above. Although monoacyl glycerol esters of fatty acids increase the stability of fatty acid vesicles to Mg2+ by 2–3 fold,57,59 these monoglycerides would not be expected to mediate selective growth because of decreased acyl chain packing within the membrane. 35
The presence of diacyl phospholipids could have accelerated the emergence of more complex polypeptides. The decreased permeability of diacyl phospholipids would be expected to decrease the ability of protocells to passively acquire nutrients. 50 As some chemical exchange across membranes is necessary, decreased permeability would have put a selective pressure on the system to develop transport machinery, unless alternative mechanisms, such as the formation of defects and pores arising from freeze-thawing 60 and macromolecular crowding, 61 were available. Nevertheless, highly impermeable membranes are advantageous to a (proto)cell, since impermeability allows for the exploitation of concentration gradients to drive thermodynamically unfavorable but necessary (proto)cellular processes. 62 It is also possible that diacyl phospholipids aided the emergence of soluble peptide enzymes in addition to peptides localized to the membrane. For example, fatty acids are competitive inhibitors of polymerases.63,64 In the presence of small amounts of diacyl phospholipids, however, the availability of fatty acids in solution would be greatly decreased,39,49 due to the decreased desorption rates of fatty acids from diacyl phospholipid containing membranes. Less free fatty acid could have then facilitated the emergence of enzymes with polymerase activity and thus perhaps the emergence of more complex genetic systems. In fact, model intermediate membrane compositions containing phospholipids and fatty acids can carry-out encapsulated transcription and translation while retaining the ability to acquire nucleotides and amino acids. 49 Of course, there is currently insufficient data to support this speculation. It is also possible that protein enzymes emerged later, after the reliance on membranes primarily composed of fatty acids passed.
Nucleic acid and peptide polymers
The prebiotic synthesis of the building blocks of nucleic acids and peptides has been successful enough to indicate that nucleosides and amino acids were likely present on the prebiotic Earth.27,65–68 Since the early work of Miller, Urey, and Oró, the central role of cyanide-based chemistry in prebiotically producing the molecules of life has been clear. Importantly, the products of these prebiotic syntheses mirror what has been found in carbonaceous meteorites. 69 Since the molecules of life tend to be photostable,70–72 and UV-light appears to be important for the prebiotic synthesis of pyrimidines73,74 and inorganic cofactors of proteins, 75 UV-light seems to have played a critical role in the availability of molecules on the prebiotic Earth. Of course, factors other than photostability must have been important as well. For example, a large number of presumably photostable amino acids not found in biology have been identified in meteorites, 76 suggesting that only a subset of the potential building blocks of life were selected.77,78 Nevertheless, caveats to these theories exist. Some argue that there existed an original restricted set of amino acids, for example, that were later supplemented by cellular activity. 79 It is also possible that the first polymers from which modern-day polymers evolved from were chemically distinct from extant biopolymers.3,80 In other words, the original building blocks may not have resembled extant amino acids or nucleotides.
What has been more difficult to assess is how the individual building blocks could have polymerized into nucleic acids and polypeptides. Some success has been achieved in pushing these dehydration reactions forward through cycles of drying through heating followed by wetting.81,82 The presence of lipids facilitates the synthesis of RNA from mononucleotides, presumably by arranging and condensing the nucleotides between the lamellae of lipid membranes during drying.83,84 Recently, the formation of peptides was found to be facilitated by the presence of α-hydroxy acids during such dry/wet cycles.85,86 However, a significant problem emerges at temperatures above 65°C, where the hydrolysis effectively competes with polymerization, 87 thereby leading to the production of short, scrambled sequences. 82 Although such instability may have selected for folded sequences more resistant to degradation, 81 other paths towards biological-like polymers that do not rely on heating are possible.
In biology, polymers are synthesized from activated monomers. Life exploits activation chemistry with high energy barriers so that the chemistry of the cell can be regulated by enzymes. In the absence of such catalysts, activation chemistry is needed so that the rates of polymerization in aqueous solution surpass the rates of hydrolysis. Examples of such phenomena include the carbonyl sulfide chemistry described by Ghadiri and colleagues 88 for the polymerization of amino acids, 88 the reactions of imidazole activated nucleotides to support the template directed synthesis of nucleic acids,89–91 and diamidophosphate-mediated chemistry, which leads to the formation of polypeptides, oligonucleotides, and phospholipids. 26 Carbonyl sulfide is found in the atmosphere and is emitted from volcanoes. Prebiotically plausible routes to imidazole-activated nucleotides have been proposed via cyanogen chloride 92 and methyl isocyanide93,94-assisted synthesis, and diamidophosphate is formed from the ammonolysis of metatriphosphate.95,96 Nonenzymatic polymerization reactions mediated by such activation chemistry can be enhanced by the concentrating effects of eutectic phases, 97 which are also compatible with the assembly and selection of active ribozymes.98,99 It should be emphasized, however, that while it is desirable to identify conditions that favor polymerization over degradation, degradative processes are important for cellular and prebiotic chemistry. 100
A central tenet of the RNA World hypothesis is that at some point, catalytic RNA molecules arose through non-enzymatic means that were capable of copying of RNA, i.e. there was once an RNA-dependent RNA polymerase ribozyme or replicase. However, the search for such a polymer has been quite difficult. After the initial report of an RNA ligase ribozyme by Bartel and Szostak in 1993, 101 it took almost two decades to improve the class I RNA polymerase ribozyme 102 to the point that ∼95 nucleotides could be copied, enough to generate an active hammerhead ribozyme. 103 However, the processivity was not enough to synthesize another active RNA polymerase ribozyme (∼200 nucleotides in length). The successful selection of cross chiral RNA polymerase ribozymes demonstrates that RNA molecules that exploit more tertiary interactions as opposed to direct base-pairing decreases nucleotide bias 104 and further suggests that at least some of the difficulties encountered in identifying more processive RNA polymerase ribozymes stem from the limitations of current selection technologies. 105
Or perhaps the difficulty in developing RNA polymerase ribozymes reflects a too stringent interpretation of the RNA World. The ribosome may be a ribozyme,66,106–108 but this enzyme still depends on the presence of peptides for robust activity. In other words, if prebiotic chemistry gave rise to both RNA and peptides, 27 then it is logical to assume that both worked together cooperatively. In fact, all that is needed to aminoacylate RNA, a key step in the RNA-guided biosynthesis of peptides, are short oligomers ranging between 5 and 45 nucleotides.109–112 Additionally, codon length trimer oligonucleotides facilitate both nonenzymatic 113 and ribozyme-mediated copying of RNA. 114 This type of reasoning led Holliger and colleagues 115 to explore the influence of basic peptides on the activity of RNA polymerase ribozymes. The peptides decrease the dependence on Mg2+ down to concentrations that are compatible with fatty acid vesicles.
Compartments are thought to be important to systems of replicating RNA to protect against take over from parasitic sequences, i.e. easily copied RNA sequences that do not provide a selective advantage to the system.1,116,117 This protection does not, however, remove the ability of RNA molecules to compete against each other since efficient RNA replicators should outcompete less efficient replicators through the selective growth and division 45 of the vesicle, as described above. A catalyst engaged in the synthesis of diacyl phospholipids would similarly help the catalyst-vesicle system outcompete other vesicles that lacked such a catalyst. 50 However, these important mutually beneficial links between RNA and vesicles have not been adequately demonstrated experimentally, because the necessary ribozymes either have not been identified or are too inefficient. Instead, the non-enzymatic copying of oligonucleotides35,36 and the reconstitution of hammerhead ribozyme activity 59 were reconstituted in fatty acid vesicles, and an RNA polymerase ribozyme copied an RNA template within phospholipid vesicles. 115 While impressive, none of these systems disrupt equilibria enough to lead to vesicle growth or division. Without an intimate connection between RNA and vesicle chemistry, assembled models of protocells will continue to function as a composite of separate systems as opposed to an integrated, life-like system where the genotype directly affects a selectable phenotype.
Metabolism
Some sort of metabolic activity is needed to maintain the low entropy, out of equilibrium state of a living cell. 118 To do so, biology couples thermodynamically favorable reactions to the unfavorable chemistry needed to maintain the cell. Although there is no a priori reason why life must begin with a network of chemical reactions that resembles what is found in biology today, the universality of central metabolism has often been interpreted to indicate that extant-like metabolic chemistry was present on the early Earth before there was life.119,120 However, while the plausibility of individual reactions occurring on the prebiotic Earth is high, it is much less clear if entire metabolic-like cycles could have existed. 121 Nevertheless, model prebiotic versions of glycolysis,122–124 the Wood–Ljungdahl pathway, 125 and the citric acid cycle126–128 have been constructed in the laboratory. In each case, activity was dependent upon the presence of metals, either in elemental, mineral, or hydrated ionic form. This dependence on metals is not surprising since extant metabolism is completely reliant on the activity of metalloproteins in which the metallocofactor itself is largely responsible for the catalysis. 129
The point of metabolism is generally not to produce specific molecules. For example, ATP alone does not drive a reaction better than any other molecule in the absence of coupling chemistry. Instead, dissipative reaction networks are harnessed to maintain the far-from-equilibrium state of the cell. Early attempts at building chemical models of dissipative systems have focused on reproducing one of the hallmarks of out-of-equilibrium biological networks, such as oscillatory behavior. Proof-of-principle reactions have been demonstrated with small molecules, 130 nucleic acids,131–135 and protein enzymes.136–138 More recently, a network of organic reactions dependent upon thiol and amide chemistry gave rise to oscillatory behavior in the absence of enzymatic activity. 139 Nevertheless, these conceptually important papers do not describe chemistry that is directly tied to the fitness of a protocell or to a chemical reaction network. To this end, dynamic reaction networks140,141 were developed that lead to the replication or persistence of peptides or peptide-derived structures142–144 and vesicles.145–148 However, thus far, all of the developed systems are far from functioning as a metabolic network capable of supporting a cell, in part, because metabolism addresses multiple cellular needs at once. To use a real-world example, it is as if the different wheels of a car are not only spinning independently but not even touching the ground. Metabolism requires integration and must directly support the maintenance of the cell. At the very least, effort should be expended to integrate nucleic acid and peptide chemistry. 149 One path forward could be to focus on the shared chemistry that is used to mediate the different dissipative systems developed thus far. Many of these dynamic networks rely on thiol chemistry. Such chemistry has long been hypothesized to have been important for the origins of life,150,151 and the photochemistry of thiols can be tied to the replication of protocells. 46
Another obvious problem with nearly all of these dissipative systems is that each system is highly contrived. Prebiotically implausible conditions are used and often times precise mixing within a microfluidic device is required. This is because no molecular component is present to govern the myriad of chemical reactions. Natural living systems have enzymes that not only catalyze reactions but do so in a manner that is responsive to the needs of the cell, as recognized by Oparin nearly a century ago. 2 It remains challenging to understand how a dissipative, combinatorial system could give rise to a network of polypeptides and nucleic acids that exhibit interdependent enzyme-like activity that sustains a protocell. However, the barrier to the development of the regulators of protometabolism, i.e. enzymes, may not be as high as imagined. For example, dipeptides can display catalytic activity152–156 and can selectively coordinate the types of metal ions that are important for enzymatic catalysis.75,157 Therefore, it seems likely that short, catalytically active (metallo)peptides existed on the prebiotic Earth. Since short peptides can self-assemble158–160 and bioinformatic analyses suggest that ancient peptide motifs exist within modern day proteins, 161 it may be that ancient hetero-complexes of peptides displayed activity more akin to an enzyme than just a simple catalyst. Polymerization of the peptides within the aggregate into longer protein-like structures may have then improved activity. 162 But again, systems need to be integrated. Without a supporting genetic system, the utility of this new enzyme-like polymer would be limited.
Conclusion
By some measures, the prospect of building a functioning model protocell seems quite close. Remarkable progress has been made in replicating strands of RNA and in developing vesicle compartments amenable to the range of tasks needed to sustain a protocell life cycle. 163 Nevertheless, after much effort, such a model system is still not available. A reasonable conclusion could be that the component parts or systems constructed separately thus far do not fit together in the same way as would be the case for a mechanical machine. 164 In other words, the disparate subsystems needed to maintain a cell must be deeply integrated, as would be the case for integral (proto)cellular components that have emerged and/or evolved together (Figure 3). Since biology largely relies on enzymes to manage the coordinated activity of an integrated system, it would appear that building a cell (or artificial cell) with extant biological parts would be easier than endeavors to build (proto)cells without modern enzymes. However, the artificial cell field seems no closer to achieving success. Individual protein-mediated pathways, such as those that divide vesicles 8 and replicate DNA, 165 have been largely reconstituted, but the disparate subsystems needed to sustain a cell are even less integrated than what has been achieved with model protocells. Instead, artificial cell research has shown more promise thus far in developing systems with biotechnological applications166–170 than in building a convincing cellular mimic. Regardless of whether the goal is to build a cell that mimics extant cellular life or to understand how life could have emerged on the prebiotic Earth, what seems to be needed is greater insight into how deeply integrated, dissipative systems can be harnessed to build truly life-like chemical systems.
Footnotes
Authors’ contributions
O.D.T. and S.S.M. wrote the manuscript together.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
Funding from the Simons Foundation (290358) is gratefully acknowledged.
