Using Physicochemical Measurements to Influence Better Compound Design

Abstract

During the past decade, the physicochemical quality of molecules under investigation at all stages of the drug discovery process has come under particular scrutiny. The issues associated with excessive lipophilicity and poor solubility in particular are many and varied, ranging from poor outcomes in screening campaigns to promiscuity, limited and/or poorly predictable pharmacokinetic exposure, and, ultimately, greater chances of clinical failure. In this review, contemporary methods to secure key measurements are described along with their relevance to understanding the behavior of molecules in environments pertinent to pharmacological activity. Together, the various measurements contribute to predictive models of both the physicochemical properties themselves and the outcomes they influence.

Keywords

physicochemical properties lipophilicity property-based design chromatographic measurements

Introduction

The physicochemical properties of potential drug molecules are at the forefront of contemporary thinking and practice in medicinal chemistry,^1–3 and are key quality indicators⁴ that demonstrably affect attrition.^5,6 In drug discovery, physicochemical properties can be defined as the tangible physical attributes of molecules that are related to interactions with different media and environments. In this context, the most important parameters in drug discovery are the interrelated properties of lipophilicity^7,8 (partition and distribution coefficients), solubility, and pKa,⁹ which underpin these behaviors, summarized in Box 1 . It is logical to expect that any prospective drug molecule must possess some level of water solubility to enable systemic exposure via aqueous media to reach its target and that its physical makeup must have complementary features to specifically engage with that target.¹⁰ The aqueous nature of the digestive tract, body fluids, and the intracellular milieu is the opposite of the more lipophilic environment of the sites of drug action and transport, where recognition will require a particular combination of polar and, particularly, hydrophobic interactions.¹¹ Indeed, the essence of medicinal chemistry is finding a compromise between these conflicting requirements to identify molecules with optimal properties to deliver appropriate activities, pharmacokinetic exposures, and pharmacodynamic responses,¹² while minimizing off-target activity or toxicity.¹ Indeed, the term “molecular obesity” has been termed to describe an addiction to lipophilicity-driven practices,¹³ and overuse of aromatic rings¹⁴ in experimental structures has been shown to have shortcomings,^15,16 alternatively described as a need to “escape from flatland.”^17,18

Box 1.

Definitions of key physicochemical properties

Partition coefficient (P)

Commonly expressed as log₁₀P or logP, this is the intrinsic lipophilicity of a compound, commonly measured in octanol water. It is a constant for any given compound, the only figure for compounds with no ionizable centers, or the asymptotic value for the un-ionized if there are potential charged centers.

P = \frac{[solute] octanol}{[solute] water}

(1)

Distribution coefficient (D)

Commonly expressed as log₁₀D_pH or logD_pH at a stated pH (pH 7.4 is commonly quoted in drug discovery), this is the effective hydrophobicity of a compound and measures distribution of all species. This varies with pH, given profound differences (typically 500- to 10,000-fold) in the distribution of ionized and un-ionized forms; the difference between the pKa and pH will influence the degree of ionization.

D_{pH} = \frac{[HA] + [A -] octanol}{[HA] + [A -] water} (acids) D_{pH} = \frac{[B] + [BH +] octanol}{[B] + [BH +] water} (Bases)

(2)

pKa

The pKa refers to the pH at which an ionizable center, be it an acid or base, is present with equal proportions of the charged and uncharged forms (i.e., 50% ionized). It is derived from the Henderson–Hasselbalch equation:

pKa = pH - \log ([A-] / [HA]) for an acid and pKa = pH - \log ([BH +] / [B]) for a base

(3)

such that when the species is 50% ionized, then [A−] = [HA], or [BH+] = [B], then pH = pKa.

Solubility

Solubility, to adapt the IUPAC (International Union of Pure and Applied Chemistry) definition, is the analytical composition of a saturated solution expressed as a proportion of a drug (solute) in a designated solvent. This represents the propensity of a molecule to act as a solute in a liquid medium to form a solution wherein the solute molecules are dispersed and individually surrounded by solvent molecules. Water solubility is pertinent to drug discovery, but the pH will vary with location, as will the overall composition of the solution, with additives designed to mimic various biorelevant fluids ( Table 1 ).

Table 1.

Composition of Simulated Fasted or Fed State Simulated (Gastric or Intestinal) Fluids, Commonly Used to Measure Solubilities Pertinent to Drug Discovery.

Fluid	Solvent and pH	Key Components	Aqueous Acid orBase Added
FaSSGF	Water 1.6	Sodium taurocholate; lecithin, NaCl	HCl, qs
FeSSGF	1:1 Milk/water 5	NaCl, acetic acid, sodium acetate	HCl, NaOH, qs
FaSSIF	Water 6.5	Sodium taurocholate; lecithin, maleic acid, NaCl (dilute)	NaOH, qs
FeSSIF	Water (oil?) 6.5 (or lower)	Sodium taurocholate; lecithin, maleic acid, NaCl (concentrated)	NaOH, qs

FaSSGF, Fasted state simulated gastric fluids; FeSSGF, fed state simulated gastric fluids; FaSSIF, fasted state simulated intestinal fluids; FeSSIF, fed state simulated intestinal fluids; qs = quantum satis (“the amount needed”).

The measurement of physicochemical properties of experimental compounds is routine in contemporary practice,¹⁹ and data are often generated in high-throughput assays²⁰ from 10 mM DMSO stock solutions. At GlaxoSmithKline (GSK), about 25,000 compounds per annum are run through a bundled package of physicochemical assays,²¹ the experimental procedures for which are summarized in Box 2 .²² The inclusion of high-throughput measurements on high-performance liquid chromatography (HPLC) columns packed with immobilized human serum albumin²³ (HSA) and immobilized artificial membrane (IAM)²⁴ in such bundles gives additional insight into plasma protein binding, volume of distribution,²⁵ and unbound concentrations.²¹ Together, these biomimetic measures have been used to estimate likely free concentrations (termed “drug efficiency”²⁶) by an empirical HPLC estimation²⁷ that can be used to provide information on optimization of the likely dose.²⁸

Box 2.

Typical experimental protocols of GlaxoSmithKline (GSK) physicochemical and biomimetic assays

Kinetic Solubility Assay

The kinetic solubility of a compound is measured using a stock solution of the compound dissolved in DMSO, which is diluted (1:20) with phosphate-buffered saline at pH 7.4, equilibrated for 1 h at room temperature, and filtered through Millipore Multiscreen filter plates (Merck Millipore, Burlington, MA). The filtrate is quantified with a charged aerosol detector.³³

Thermodynamic Solubility of Solid Compounds in Biorelevant Media

Thermodynamic solubility is determined by dispensing the relevant buffer (e.g., 1 ml of fasted state simulated intestinal fluid [FaSSIF], fed state simulated intestinal fluid [FeSSIF], simulated gastric fluid [SGF], or simulated lung fluid [SLF]—see Table 1) into a 4 ml glass vial containing circa 1 mg of solid compound.³⁰ The resulting suspension is shaken at 900 rpm for 4 h at room temperature before residual solid is removed by filtration using a MultiScreen HTS 96-well solubility filter plate (Millipore). The supernatant is quantified by high-performance liquid chromatography–ultraviolet (HPLC-UV), with a dynamic range of typically 1–1000 µg/ml.

Lipophilicity: Chrom logD Assay

Lipophilicity is measured by reversed-phase HPLC on a C18 column (50×2 mm 3 µM Gemini NX C18, Phenomenex, Macclesfield, UK) at pH levels 2, 7.4, and 10.5, using fast gradient acetonitrile-aqueous buffer mobile phases. The Chromatographic Hydrophobicity Index (CHI) values are derived directly from the gradient retention times by using a calibration line obtained for standard compounds.²⁰ Translation of CHI values into Chrom logD values³⁹ at the given pH is achieved using the empirically derived Eq. 4.

{Chrom logD}_{pH} = (0.0857) {* CHI}_{pH} - 2.00 .

(4)

pKa Determination

The pKa determination is based on acid–base titration quantified either by UV spectroscopy or potentiometrically using a Sirius T3 (Sirius Analytical Ltd, Forest Row, UK) instrument, typically requiring 5 µl of a 10 mM solution of the samples, and the UV absorbance is monitored throughout 54 stepped pH values for about 5 min.⁹ When the ionization center is remote from any UV chromophore, a potentiometric acid–base titration is used. Usually, 0.5–1 mg of solid material is required for these measurements. The pH of each point in the titration curve is calculated using mass balance equations, and the calculated points are fitted to the measured curve by refining the pKa(s). For poorly soluble compounds, a method using various concentrations of co-solvent (usually methanol) is applied. The pKa in water is calculated from the Yasuda–Shedlovsky extrapolation.

Protein Binding Assays (Human Serum Albumin [HSA] and Alpha-1-Acidglycoprotein [AGP])

Chemically bonded HSA and AGP HPLC stationary phases (Chiral Technologies, Illkirch, France) are used for measuring compounds’ binding to plasma proteins, applying linear gradient elution up to 30% isopropanol with 50 mM pH 7.4 ammonium acetate buffer.²³ The gradient retention times are standardized using a calibration set of compounds. The %HSA bound gives a reliable indication of the free fraction of compound in plasma when compared to more complex pharmacokinetic methods.

Phospholipid Binding Assay (IAM)

The binding of compounds to immobilized artificial membrane (IAM) is measured using a commercially available IAM PC DD2 100×4.6 mm 10 µM (Regis Analytical, West Lafayette, IN) HPLC column. Gradient retention times obtained by applying acetonitrile gradient up to 85% are converted to Chromatographic Hydrophobicity Indices (CHI IAM) using a calibration set of compounds.²⁴ The CHI IAM values are converted to the logarithmic retention factors using the following formula: log K_IAM = 0.046 * CHI_IAM + 0.42.

Solubility

The protocol and type of solubility measurement²⁹ are important factors to consider.³⁰ Typically, rapid, high-throughput experiments³¹ (often using a precipitative method in aqueous buffer from a small volume of 10 mM stock in a carrier solvent such as DMSO) give a kinetic solubility, representing the maximum solubility of the fastest-precipitating species of the compound, often quantified by chemiluminescent nitrogen detection (CLND)³² or charged aerosol dispersion (CAD).³³ More intensive experiments, run by dissolution of better-characterized solid samples with longer equilibration times, furnish thermodynamic solubility data,³⁰ whereby the dissolved compound is in equilibrium with the undissolved material (e.g., a stable polymorph) in excess.²⁹ Table 1 gives examples of commonly used simulated fluids³⁴ used to estimate solubilities pertinent to drug discovery.

Implications of Poor Solubility

In the early stages of drug discovery, compounds that are poorly soluble and/or lipophilic can cause numerous problems that hinder various processes or give misleading outcomes.³⁵ Poorly soluble compounds can hinder automation and the outcomes of high-throughput techniques through equipment failure due to blockages or cross-contamination; otherwise, the compound may not be soluble enough to maintain a stock solution at an appropriate concentration.³⁶ Highly lipophilic compounds may stick to the plastic screening plates (thus reducing effective concentration in the well), or they may form aggregates that are common sources of false positives. It is imperative that trained eyes inspect dose–response curves to look for signs of low solubility ( Fig. 1a ), or plate patterns^37,38 could be influenced by carryover of an insoluble compound; false negatives may occur when insufficient solubility at the top concentration of a dilution series means that no compound is present in the assay wells. In addition, lipophilic compounds may give nonspecific inhibition of the target protein or bind to other reagents or coupling partners in a given assay. Such shortcomings are manifest in the propensity for more lipophilic compounds to be promiscuous binders, through exhibiting activity versus multiple proteins.^5,39 Shortcomings due to colloidal aggregation,⁴⁰ a risk accentuated with lipophilic compounds, are well documented,⁴¹ and predictive methods to highlight such behaviors are emerging.⁴² The quality of chemical probes can also be described in terms of their lipophilicity,^43,44 and overinterpretation of data from a phenotypic experiment could be a consequence of a nonspecific lipophilic tool compound. Methods are emerging, however, to enhance the output of experiments with poorly soluble molecules.⁴⁵

Figure 1.

Activity (percentage of inhibition vs. concentration) curve with a characteristic tail at high concentrations due to low solubility.

The Lipinski Rule of 5

A watershed in drug discovery practices in 1997 was the publication by Lipinski et al. of the Rule of 5 (Ro5),⁴⁶ based on their observation of a disconnect between the properties of oral drugs and those of molecules typically pursued as hits from high-throughput screening (HTS). It was noted that when two or more thresholds of molecular weight (MW) >500, calculated logP >5, a number of hydrogen bond donors >5, or hydrogen bond acceptors >10 were exceeded, oral exposure would be limited by poor solubility and/or poor permeation.⁴⁷ This rule continues to engender debate,⁴⁸ criticism,⁴⁹ and misinterpretation,⁵⁰ but it certainly reined in some of the potency-driven excesses of HTS and brought some discipline to design. When the Ro5 has caused issues, it is likely through overinterpretation; regarding the values as thresholds is certainly unwise.² MW is, per se, probably an unimportant parameter (but can be calculated with precision!), and the logP 5 threshold is rather higher than the majority of oral drugs,⁵¹ and the pattern has not substantively changed in spite of recent inflation in the MWs of drugs.⁵² The availability of high-throughput measured lipophilicity, solubility, and permeability data allows a more iterative appraisal of the likely issues the Ro5 should highlight and identifies cases in which the rules might be broken. As with many aspects of medicinal chemistry, there is a trade-off in outcomes; less lipophilic compounds are likely to be more soluble and, on balance, relatively lower-affinity binders with poor permeability. As lipophilicity increases, the solubility will generally decrease, and affinity and permeability increase.⁵³ Permeation has a biphasic response to lipophilicity,^39,54 although with bigger molecules there may be a requirement for increased lipophilicity,⁵⁵ measurements that may be enhanced by modulating protein composition of the experimental medium.⁵⁶ A fierce debate remains over the mechanisms of non-carrier-mediated membrane passage, be it passive through the bilayer⁵⁷ or entirely facilitated by transmembrane proteins.⁵⁸

Contemporary Hit-Finding Methods

The discovery of new chemical matter^59,60 in contemporary practice in drug discovery⁶¹ is dominated by screening of one type or another.⁶² Such campaigns typically range in size and scale from fragments⁶³ (MW ~150 to 280 Da, 10² to 10³ compounds), through focused or diverse sets⁶⁴ (10³ to 10⁶ compounds, MW ~200 to 500 Da, screened in vitro versus isolated proteins or in cell-based assays) in HTS,⁶⁴ or higher numbers in affinity-based selection using DNA-encoded libraries (MW ~200 to 600 Da, 10⁵ to 10¹⁰ compounds).^65–67 There is discussion regarding how best to sample the extent of likely combinations of feasible compounds that exist within the bound of “druglike space” (usually defined within the bounds of the Ro5).^68,69 Understanding how molecular topology and features, defined as molecular complexity,^70,71 can influence the chances of making a productive binding event leads to the conclusion that relatively few fragment-sized molecules give the best odds.⁷² Beyond fragments, attempts to improve quality in larger or leadlike⁷³ molecules spawned initiatives such as lead factories,⁷⁴ and purchasable space can be explored computationally.⁷⁵ It is pertinent to consider that the technique used to identify the starting point of a drug discovery program does not define the properties of the candidate(s) selected,⁷⁶ but the varying practices⁷⁷ used on the journey.⁷⁸

Hits to Leads

In the hit-to-lead (H2L) phase,⁵⁹ a process of hit confirmation and expansion from all or any of the above techniques, it is first of all vital to ensure that the activity and binding are genuine – that is, not influenced by impurities or an interference mechanism due to contaminants or reactive substructures⁷⁹ (sometimes described as pan-assay interference compounds, or PAINs).⁸⁰ This is achieved using newly prepared and purified samples in the primary assay, run, ideally, in tandem with an assay format using an orthogonal readout or biophysical measurement.⁸¹ When selecting samples for qualification, any understanding of the physicochemical profile of the compound (e.g., solubility, charge, or lipophilicity)⁸² will help put the binding and demonstrable pharmacological activity in context and influence prioritization and avoidance of likely promiscuous binders.^83,84

A successful screen identifies a number of qualified hits that can be prioritized based on their chemical tractability,⁸⁵ activity, ligand efficiency⁸⁶ (LE, the activity engendered by each heavy atom in the molecule⁸⁷), lipophilicity, and solubility. The generation of rational structure–activity relationships (SARs) during this phase of optimization is often termed H2L work and builds confidence that a program of work is capable of delivering a compound with a combination of activity and physical properties appropriate for an efficacious candidate molecule. Progress toward this goal can be mapped using physical properties and efficiency metrics;⁷⁸ ligand lipophilicity efficiency (LLE), the difference between activity and a lipophilicity estimate (e.g., pIC₅₀ − logP), is a universally recognized metric,^86,88 embodying a principle proposed by Hansch in 1987 that “molecules should be made as hydrophilic as possible without loss of efficacy.”⁸⁹ Recent reviews^78,88,90 demonstrated that drugs almost invariably possess some of the best combinations of efficiency and properties achieved for a given target,⁹¹ visualized by plotting LE versus LLE.

With progression through the discovery phases, the behavior of compounds in cell-based assays and pharmacokinetic and pharmacodynamic/efficacy studies is all influenced by the physicochemical makeup of the molecule.⁹² The molecule must be sufficiently soluble to dissolve (favored by low lipophilicity), but activity and passage into and through cells generally require higher lipophilicity—which also brings the risks of increased binding to other proteins and/or increased metabolism. These are some of the balances and trade-offs sought in a typical lead optimization.⁵³

Predictive Models of Physicochemical Measurements

To complement the impact of measuring physicochemical parameters and exploring their influence, contemporary drug discovery is often driven by a “predict-first” design culture. Commercial software packages to predict physicochemical parameters are available, and these are often supplemented by modeling of in-house data to generate bespoke models.⁹³ Indeed, given the quality of these predictive models, it is a missed opportunity not to use them prospectively, although analyses of practices would suggest that they are being used and exploited.^2,78

In contrast, solubility is notoriously hard to predict with any great precision,⁹⁴ due to the underlying complexity of the mechanism involved, most notably in estimating the lattice energy of the crystalline form.⁹⁵ In addition, the dynamic range of measured high-throughput data (typically with a limit of quantitation of around 1 µM in high-throughput assays, wherein the upper limit is about 500 µM based on dilution of 10 mM DMSO stocks) gives a relatively narrow spread of data to model. The General Solubility Equation⁹⁶ (GSE; Box 3, Eq. 5), however, provides both a useful predictive method and an illustrative principle to understanding solubility changes, based on the contributions of lattice energy and lipophilicity.

Box 3.

Yalkowsky’s general solubility equation

The General Solubility Equation:

Log Solubility = 0.5 - 0.01 (MP - 25) - LogP

(5)

Additionally, for compounds with an ionophore:

{Log S}_{pHx} = 0.5 - 0.01 (MP - 25) {- Log D}_{pHx} is a useful guide and approximation

(6)

where LogS is molar aqueous solubility, MP is the melting point in Celsius, and LogP and LogD are the partition and distribution coefficients, respectively.

Table 2 is a representation of the GSE colored by predicted solubilities, in which the coloration reflects levels of solubility commensurate with oral exposure and efficacy; it is thus not coincidental to consider that the median logP of drugs is around 3.⁵¹ To solubilize an aspirational⁹⁷ oral drug dose of 100 mg, with a molecular weight of 400, this equates to a 250 µM solution (100 mg in 1000 ml); the typical stomach volume is roughly 300 ml.

Table 2.

Matrix of solubilities predicted by the GSE based on indicated logP and melting point values, colored by GSK classifications of reasonable >200 µM, intermediate 30–200 µM, and poor <30 µM solubility.

Computed log S values (from GSE) in table		Log₁₀ (partition coefficient)
Computed log S values (from GSE) in table		1	2	3	4	5
Melting point values (° C)	50	−0.75	−1.75	−2.75	−3.75	−4.75
	100	−1.25	−2.25	−3.25	−4.25	−5.25
	150	−1.75	−2.75	−3.75	−4.75	−5.75
	200	−2.25	−3.25	−4.25	−5.25	−6.25
	250	−2.75	−3.75	−4.75	−5.75	−6.75
	300	−3.25	−4.25	−5.25	−6.25	−7.25

GSE, General Solubility Equation; GSK, GlaxoSmithKline.

The patterns observed in Table 2 were broadly reproduced in analyses of GSK data, which were enhanced to show an orthogonal impact of the simple aromatic ring count (#Ar), such that a higher ring count reduced solubility regardless of the impact of lipophilicity, whereby 1× Ar ring ~ 1 log(lipophilicity) ( Fig. 2 ). This led to the formulation of the Solubility Forecast Index (SFI = logD_7.4 + #Ar), a probabilistic score of likely outcomes in developability assays (including solubility), embodying the principle of minimizing lipophilicity and aromatic ring count.^32,39 As further sets of data were analyzed, it was evident that this probabilistic score showed differentiation of outcomes in other developability assays, especially with the higher quality of chromatographic lipophilicity measurements, termed the Property Forecast Index (PFI = Chrom logD_7.4 + #Ar).³⁹

Figure 2.

The orthogonal impact of lipophilicity and aromatic ring count on solubility distribution (colored as in Table 2 ). The hashed blue diagonal line represents the line of Chrom logD_7.4 + #Ar (PFI) = 7, which exhibits a marked differentiation between good and poor solubility distribution.

Generating Predictive Models

This section reviews opportunities offered in contemporary methods and practice that enable the building of more complex and accurate models.

Quantitative structure–activity relationships (QSARs) are established approaches for providing a mechanism to predict the properties of new molecules on the basis of information extracted by examining preexisting data. Statistical and/or machine learning–based algorithms are used to define a relationship between chemical structure and the variance in the response of interest, whether that is a physicochemical property, pharmacokinetic parameter, or potency measure. The key decisions regarding how to build the QSAR relate to the choice of algorithm, the choice of how to describe the chemical structures, and, perhaps most importantly, the choice of which data to use in model building. On top of that are questions relating to how to establish confidence in the predictive power of the final model and the domain of applicability. The Kubinyi paradox⁹⁸ recognizes how variability in prediction errors, which depend on test set size, might cause poor-quality predictions in small datasets without robust cross-validation.

A wide armory of algorithms is now available to the model builder, ranging from simple linear, statistically based techniques such as partial least squares regression, to decision tree ensemble-based techniques such as Random Forest and XGBoost, through to advanced machine learning approaches such as support vector machines and, more recently, deep neural networks. A similar set of choices is available for describing the chemical structure, ranging from fundamental physicochemical properties through to a variety of fingerprint-based approaches. The ultimate “best” choice will depend on the complexity of the underlying mechanism being modeled and to what extent the information contained within the chemical substructure plays a more important role than that which can be captured with more simple (and understandable) macro properties.

The key to successful QSAR model building lies within the external validation of the final model;⁹⁹ most practitioners would take the available data and split it into a training set and a test set, although some will rely on cross-validation: The ultimate test is based on continued examination of temporal datasets.⁹⁹ The statistical performance therein will define the utility of the model.¹⁰⁰

To fully evaluate the multitude of choices described above, automated modeling platforms^101,102 now provide a more objective assessment of the most appropriate modeling strategy for a particular dataset. These simultaneously build many models of differing types and then use the ensuing statistics to rank the models and inform on the best choice. Here, the concept of Occam’s razor is a useful guide—the model should be sufficiently complex in all aspects to achieve predictive power but no more complex than needed. A comprehensive review of the current state of the art was published recently by Cherkasov et al.¹⁰³

At GSK, continual QSAR analysis of the large proprietary datasets of physicochemical and biomimetic measurements²² furnishes up-to-date and impactful models of these fundamental parameters. Predictive models of lipophilicity, as defined by chromatographic LogP and LogD, are of particular value and of high quality; for example, the recent temporal validation of the Chrom LogD_7.4 model indicated an R² of 0.87 ( Fig. 3 ). The wide dynamic range of values, high precision of measurements, and relatively simple underlying mechanism contribute to the high quality.

Figure 3.

A recent review of the latest GlaxoSmithKline Chrom LogD_7.4 data, illustrating the excellent performance of the latest model versus calculated values.

The development of QSAR models of these properties furnishes input descriptors to build QSARs around more complex ADME-related endpoints, such as permeability or clearance. In an ideal world,¹⁰⁴ models would guide the design molecules with the correct balance of desired properties to achieve the target profile; progress has been made toward this utopia, but predictors cannot yet enable the medicinal chemist to predict bioavailability, half-life, and potency with confidence, ultimately giving a better indication of the likely final dose. Accurate predictions of the fundamental physicochemical properties provide the building blocks toward achieving these goals.

Conclusions

In conclusion, this perspective describes the commonly used physicochemical properties and how they are assessed, measured, and predicted, while highlighting the implications of these properties being suboptimal through all phases of drug discovery. Lipophilicity in particular, based on improved measurements, is a well-predicted descriptor, and such in silico models are the cornerstones of predictive methods that are increasingly used to drive drug discovery programs in predict-first cultures. Solubility is also very important, but it is sometimes not well predicted, although aide-memoires such as avoiding (high-melting) brick dust and (lipophilic) greaseballs¹⁰⁵ are principles embodied in the GSE.⁹⁶ Analyses suggest that no one-size-fits-all prescription for defining “drug likeness” exists, but making a compound as hydrophilic as possible without loss of efficacy⁸⁹ is a defining principle of good practice.⁷⁸ Following these principles, driven by judicious predictions in the design phase, will expedite the small-molecule discovery process and lead to improvements at all stages of the process,¹⁰⁶ from less false positives (or negatives) in vitro to better ADME outcomes,¹⁰⁷ fewer developability risks,⁹⁷ and better harmonization of in vitro versus in silico data.¹⁰⁸ This will ultimately lead to the progression of compounds into the clinic with better physical properties, which will lead to better predictability of outcomes¹⁰⁹ and, ultimately, a greater chance of success in the costly clinical development phases. The use of predictive physicochemical design is not yet universal in drug discovery, but it is one computational method with tangible and demonstrable impact. Making fewer, better-designed compounds is surely a rational way forward to improve productivity.

Footnotes

Declaration of Conflicting Interests

The authors disclosed the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: All authors were employed by GlaxoSmithKline at the time of the work on the article and their research and authorship of this article was completed within the scope of this employment.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Robert J. Young

References

Meanwell

N. A.

Improving Drug Candidates by Design: A Focus on Physicochemical Properties as a Means of Improving Compound Disposition and Safety. Chem. Res. Toxicol. 2011, 24, 1420–1456.

Leeson

P. D.

Young

R. J.

Molecular Property Design: Does Everyone Get It?

ACS Med. Chem. Lett. 2015, 6, 722–725.

Smith

G. F.

1—Medicinal Chemistry by the Numbers: The Physicochemistry, Thermodynamics and Kinetics of Modern Drug Design. In Progress in Medicinal Chemistry; Lawton

G.;

Witty

D. R.

, Eds. Elsevier: Amsterdam, 2009, pp 1–29.

Gleeson

M. P.

Leeson

P. D.

Waterbeemd

H. v. D.

Physicochemical Properties and Compound Quality. In The Handbook of Medicinal Chemistry. The Royal Society of Chemistry: London, 2015, pp 1–31.

Leeson

P. D.

Empfield

J. R.

Reducing the Risk of Drug Attrition Associated with Physicochemical Properties. In Annual Reports in Medicinal Chemistry; John

E. M.

, Ed. Academic Press: London, 2010, pp 393–407.

Morgan

Van Der Graaf

P. H.

Arrowsmith

, et al. Can the Flow of Medicines Be Improved? Fundamental Pharmacokinetic and Pharmacological Principles toward Improving Phase II Survival. Drug. Disc. Today. 2012, 17, 419–424.

Waring

M. J.

Lipophilicity in Drug Discovery. Expert Opin. Drug Disc. 2010, 5, 235–248.

Arnott

J. A.

Planey

S. L.

The Influence of Lipophilicity in Drug Discovery and Design. Expert Opin. Drug Disc. 2012, 7, 863–875.

Comer

Box

High-Throughput Measurement of Drug pKa Values for ADME Screening. JALA. 2003, 8, 55–59.

10.

Young

R. J.

Physical Properties in Drug Design. In Tactics in Contemporary Drug Design; Meanwell

N. A.

, Ed. Springer: Berlin, 2014; pp 1–68.

11.

Bissantz

Kuhn

Stahl

A Medicinal Chemist’s Guide to Molecular Interactions. J. Med. Chem. 2010, 53, 5061–5084.

12.

Gleeson

M. P.

Hersey

Montanari

, et al. Probing the Links between In Vitro Potency, ADMET and Physicochemical Parameters. Nat. Rev. Drug Disc. 2011, 10, 197–208.

13.

Hann

M. M.

Molecular Obesity, Potency and Other Addictions in Drug Discovery. Med. Chem. Comm. 2011, 2, 349–355.

14.

Meyer

E. A.

Castellano

R. K.

Diederich

Interactions with Aromatic Rings in Chemical and Biological Recognition. Angew. Chem. Int. Ed. 2003, 42, 1210–1250.

15.

Ritchie

T. J.

Macdonald

S. J.

The Impact of Aromatic Ring Count on Compound Developability—Are Too Many Aromatic Rings a Liability in Drug Design?

Drug. Disc. Today. 2009, 14, 1011–1020.

16.

Ritchie

T. J.

Macdonald

S. J.

Young

R. J.

, et al. The Impact of Aromatic Ring Count on Compound Developability: Further Insights by Examining Carbo- and Hetero-Aromatic and -Aliphatic Ring Types. Drug. Disc. Today. 2011, 16, 164–171.

17.

Lovering

Bikker

Humblet

Escape from Flatland: Increasing Saturation as an Approach to Improving Clinical Success. J. Med. Chem. 2009, 52, 6752.

18.

Lovering

Escape from Flatland 2: Complexity and Promiscuity. MedChemComm. 2013, 4, 515–519.

19.

Kerns

E. H.

High Throughput Physicochemical Profiling for Drug Discovery. J. Pharm. Sci. 2001, 90, 1838–1858.

20.

Valko

Bevan

Reynolds

Chromatographic Hydrophobicity Index by Fast-Gradient RP-HPLC: A High-Throughput Alternative to log P/log D. Anal. Chem. 1997, 69, 2022–2029.

21.

Valkó

K. L.

Lipophilicity and Biomimetic Properties Measured by HPLC to Support Drug Discovery. J. Pharm. Biomed. Anal. 2016, 130, 35–54.

22.

Bunally

Young

R. J.

The Role and Impact of High Throughput Biomimetic Measurements in Drug Discovery. ADMET DMPK. 2018, 6, 74–84.

23.

Valko

Nunhuck

Bevan

, et al. Fast Gradient HPLC Method to Determine Compounds Binding to Human Serum Albumin: Relationships with Octanol/Water and Immobilized Artificial Membrane Lipophilicity. J. Pharm. Sci. 2003, 92, 2236–2248.

24.

Valko

C. M.

Bevan

C. D.

, et al. Rapid-Gradient HPLC Method for Measuring Drug Interactions with Immobilized Artificial Membrane: Comparison with Other Lipophilicity Measures. J. Pharm. Sci. 2000, 89, 1085–1096.

25.

Valkó

K. L.

Nunhuck

S. B.

Hill

A. P.

Estimating Unbound Volume of Distribution and Tissue Binding by In Vitro HPLC-Based Human Serum Albumin and Immobilised Artificial Membrane-Binding Measurements. J. Pharm. Sci. 2011, 100, 849–862.

26.

Valko

Chiarparin

Nunhuck

, et al. In Vitro Measurement of Drug Efficiency Index to Aid Early Lead Optimization. J. Pharm. Sci. 2012, 101, 4155–4169.

27.

Braggio

Montanari

Rossi

, et al. Drug Efficiency: A New Concept to Guide Lead Optimization Programs towards the Selection of Better Clinical Candidates. Exp. Opin. Drug Disc. 2010, 5, 609–618.

28.

Teague

Valko

How to Identify and Eliminate Compounds with a Risk of High Clinical Dose during the Early Phase of Lead Optimisation in Drug Discovery. Eur. J. Pharm. Sci. 2017, 110, 37–50.

29.

Bergström

C. A. S.

Avdeef

Perspectives in Solubility Measurement and Interpretation. ADMET DMPK. 2019, 7, 88.

30.

Sou

Bergström

C. A. S.

Automated Assays for Thermodynamic (Equilibrium) Solubility Determination. Drug Disc. Today Technol. 2018, 27, 11–19.

31.

Glomme

März

Dressman

J. B.

Comparison of a Miniaturized Shake-Flask Solubility Method with Automated Potentiometric Acid/Base Titrations and Calculated Solubilities. J. Pharm. Sci. 2005, 94, 1–16.

32.

Hill

A. P.

Young

R. J.

Getting Physical in Drug Discovery: A Contemporary Perspective on Solubility and Hydrophobicity. Drug. Disc. Today. 2010, 15, 648–655.

33.

Robinson

M. W.

Hill

A. P.

Readshaw

S. A.

, et al. Use of Calculated Physicochemical Properties to Enhance Quantitative Response When Using Charged Aerosol Detection. Anal. Chem. 2017, 89, 1772–1777.

34.

Fagerberg

J. H.

Tsinman

Sun

, et al. Dissolution Rate and Apparent Solubility of Poorly Soluble Drugs in Biorelevant Dissolution Media. Mol. Pharm. 2010, 7, 1419–1430.

35.

Kerns

E. H.

Biological Assay Challenges from Compound Solubility: Strategies for Bioassay Optimization. Drug Disc. Today. 2006, 11, 446–451.

36.

L. I.

Kerns

E. H.

Solubility Issues in Early Discovery and HTS. In Solvent Systems and Their Selection in Pharmaceutics and Biopharmaceutics; Augustijns

Brewster

M. E.

, Eds. Springer: New York, 2007, pp 111–136.

37.

Zhai

Chen

Zhong

, et al. An Automatic Quality Control Pipeline for High-Throughput Screening Hit Identification. J. Biomol. Screen. 2016, 21, 832–841.

38.

Bushway

P. J.

Azimi

Heynen-Genel

Optimization and Application of Median Filter Corrections to Relieve Diverse Spatial Patterns in Microtiter Plate Data. J. Biomol. Screen. 2011, 16, 1068–1080.

39.

Young

R. J.

Green

D. V.

Luscombe

C. N.

, et al. Getting Physical in Drug Discovery II: The Impact of Chromatographic Hydrophobicity Measurements and Aromaticity. Drug. Disc. Today. 2011, 16, 822–830.

40.

Sassano

M. F.

Doak

A. K.

Roth

B. L.

, et al. Colloidal Aggregation Causes Inhibition of G Protein-Coupled Receptors. J. Med. Chem. 2013, 56, 2406–2414.

41.

Feng

B. Y.

Simeonov

Jadhav

, et al. A High-Throughput Screen for Aggregation-Based Inhibition in a Large Compound Library. J. Med. Chem. 2007, 50, 2385–2390.

42.

Irwin

J. J.

Duan

Torosyan

, et al. An Aggregation Advisor for Ligand Discovery. J. Med. Chem. 2015, 58, 7076–7087.

43.

Arrowsmith

C. H.

Audia

J. E.

Austin

, et al. The Promise and Peril of Chemical Probes. Nat. Chem. Biol. 2015, 11, 536.

44.

Frye

S. V.

The Art of the Chemical Probe. Nat. Chem. Biol. 2010, 6, 159.

45.

Mettou

Papaneophytou

Melagraki

, et al. Aqueous Solubility Enhancement for Bioassays of Insoluble Inhibitors and QSPR Analysis: A TNF-α Study. SLAS Disc. 2017, 23, 84–93.

46.

Lipinski

C. A.

Lombardo

Dominy

B. W.

, et al. Experimental and Computational Approaches to Estimate Solubility and Permeability in Drug Discovery and Development Settings. Adv. Drug Deliv. Rev. 1997, 23, 3–25.

47.

Lipinski

C. A.

Drug-Like Properties and the Causes of Poor Solubility and Poor Permeability. J. Pharmacol. Toxicol. Methods 2000, 44, 235–249.

48.

Leeson

P. D.

Molecular Inflation, Attrition and the Rule of Five. Adv. Drug Deliv. Rev. 2016, 101, 22–33.

49.

Shultz

M. D.

Improving the Plausibility of Success with Inefficient Metrics. ACS Med. Chem. Lett. 2014, 5, 2–5.

50.

Mignani

Rodrigues

Tomas

, et al. Present Drug-Likeness Filters in Medicinal Chemistry during the Hit and Lead Optimization Process: How Far Can They Be Simplified? Drug Disc. Today. 2018, 23, 605–615.

51.

Leeson

P. D.

Springthorpe

The Influence of Drug-Like Concepts on Decision-Making in Medicinal Chemistry. Nat. Rev. Drug Disc. 2007, 6, 881–890.

52.

Shultz

M. D.

Two Decades under the Influence of the Rule of Five and the Changing Properties of Approved Oral Drugs. J. Med. Chem. 2019, 62, 1701–1714.

53.

Naylor

M. R.

A. M.

Handford

M. J.

, et al. Lipophilic Permeability Efficiency Reconciles the Opposing Roles of Lipophilicity in Membrane Permeability and Aqueous Solubility. J. Med. Chem. 2018, 61, 11169–11182.

54.

Sugano

Kansy

Artursson

, et al. Coexistence of Passive and Carrier-Mediated Processes in Drug Transport. Nat. Rev. Drug Disc. 2010, 9, 597.

55.

Waring

Defining Optimum Lipophilicity and Molecular Weight Ranges for Drug Candidates—Molecular Weight Dependent Lower logD Limits Based on Permeability. Bioorg. Med. Chem. Lett. 2009, 19, 2844–2851.

56.

Cai

Madari

Walker

, et al. Addition of Optimized Bovine Serum Albumin Level in a High-Throughput CACO-2 Assay Enabled Accurate Permeability Assessment for Lipophilic Compounds. SLAS Disc. 2019, 2472555219848483.

57.

Smith

Artursson

Avdeef

, et al. Passive Lipoidal Diffusion and Carrier-Mediated Cell Uptake Are Both Important Mechanisms of Membrane Permeation in Drug Disposition. Mol. Pharm. 2014, 11, 1727–1738.

58.

Kell

D. B.

Oliver

S. G.

How Drugs Get into Cells: Tested and Testable Predictions to Help Discriminate between Transporter-Mediated Uptake and Lipoidal Bilayer Diffusion. Front. Pharmacol. 2014, 5.

59.

Keserű

G. M.

Makara

G. M.

Hit Discovery and Hit-to-Lead Approaches. Drug Disc. Today. 2006, 11, 741–748.

60.

Holenz

Lead Generation: Methods, Strategies and Case Studies. Wiley-VCH: Weinheim, Germany, 2016.

61.

Holenz

Stoy

Advances in Lead Generation. Bioorg. Med. Chem. Lett. 2019, 29, 517–524.

62.

Brown

D. G.

Bostrom

Where Do Recent Small Molecule Clinical Development Candidates Come From?

J. Med. Chem. 2018, 61, 9442–9468.

63.

Erlanson

D. A.

Fesik

S. W.

Hubbard

R. E.

, et al. Twenty Years On: The Impact of Fragments on Drug Discovery. Nat. Rev. Drug Disc. 2016, 15, 605–619.

64.

Macarron

Banks

M. N.

Bojanic

, et al. Impact of High-Throughput Screening in Biomedical Research. Nat. Rev. Drug Disc. 2011, 10, 188.

65.

Clark

M. A.

Acharya

R. A.

Arico-Muendel

C. C.

, et al. Design, Synthesis and Selection of DNA-Encoded Small-Molecule Libraries. Nat. Chem. Biol. 2009, 5, 647–654.

66.

Goodnow

R. A.

Jr. Dumelin

C. E.

Keefe

A. D.

DNA-Encoded Chemistry: Enabling the Deeper Sampling of Chemical Space. Nat. Rev. Drug Disc. 2017, 16, 131–147.

67.

Goodnow

DNA-Encoded Library Technology (DELT) after a Quarter Century. SLAS Disc. 2018, 23, 385–386.

68.

Reymond

J. L.

The Chemical Space Project. Acc. Chem. Res. 2015, 48, 722–730.

69.

Dow

Fisher

James

, et al. Towards the Systematic Exploration of Chemical Space. Org. Biomol. Chem. 2012, 10, 17–28.

70.

Hann

M. M.

Leach

A. R.

Harper

Molecular Complexity and Its Impact on the Probability of Finding Leads for Drug Discovery. J. Chem. Inf. Comput. Sci. 2001, 41, 856–864.

71.

Leach

A. R.

Hann

M. M.

Molecular Complexity and Fragment-Based Drug Discovery: Ten Years On. Curr. Opin. Chem. Biol. 2011, 15, 489–496.

72.

Hall

R. J.

Mortenson

P. N.

Murray

C. W.

Efficient Exploration of Chemical Space by Fragment-Based Screening. Prog. Biophys. Mol. Biol. 2014, 116, 82–91.

73.

Teague

S. J.

Davis

A. M.

Leeson

P. D.

, et al. The Design of Leadlike Combinatorial Libraries. Angew. Chem. Int. Ed. Engl. 1999, 38, 3743–3748.

74.

Karawajczyk

Giordanetto

Benningshof

, et al. Expansion of Chemical Space for Collaborative Lead Generation and Drug Discovery: The European Lead Factory Perspective. Drug Disc. Today. 2015, 20, 1310–1316.

75.

Irwin

J. J.

Gaskins

Sterling

, et al. Predicted Biological Activity of Purchasable Chemical Space. J. Chem. Inf. Model. 2018, 58, 148–164 .

76.

Keseru

G. M.

Makara

G. M.

The Influence of Lead Discovery Strategies on the Properties of Drug Candidates. Nat. Rev. Drug Disc. 2009, 8, 203–212.

77.

Leeson

P. D.

St-Gallay

S. A.

The Influence of the "Organizational Factor" on Compound Quality in Drug Discovery. Nature Rev. Drug Disc. 2011, 10, 749–765.

78.

Young

R. J.

Leeson

P. D.

Mapping the Efficiency and Physicochemical Trajectories of Successful Optimizations. J. Med. Chem. 2018, 61, 6421–6467.

79.

Chakravorty

S. J.

Chan

Greenwood

M. N.

, et al. Nuisance Compounds, PAINS Filters, and Dark Chemical Matter in the GSK HTS Collection. SLAS Discov. 2018, 23, 532–545.

80.

Baell

J. B.

Holloway

G. A.

New Substructure Filters for Removal of Pan Assay Interference Compounds (PAINS) from Screening Libraries and for Their Exclusion in Bioassays. J. Med. Chem. 2010, 53, 2719–2740.

81.

Renaud

J. P.

Chung

C. W.

Danielson

U. H.

, et al. Biophysics in Drug Discovery: Impact, Challenges and Opportunities. Nat. Rev. Drug Disc. 2016, 15, 679–698.

82.

Mortenson

P. N.

Murray

C. W.

Assessing the Lipophilicity of Fragments and Early Hits. J. Comput. Aided Mol. Des. 2011, 25, 663–667.

83.

Tarcsay

Á.

Keserű

G. M

. Contributions of Molecular Properties to Drug Promiscuity. J. Med. Chem. 2013, 56, 1789–1795.

84.

Peters

J.-U.

Schnider

Mattei

, et al. Pharmacological Promiscuity: Dependence on Compound Properties and Target Specificity in a Set of Recent Roche Compounds. ChemMedChem. 2009, 4, 680–686.

85.

Fukunishi

Kurosawa

Mikami

, et al. Prediction of Synthetic Accessibility Based on Commercially Available Compound Databases. J. Chem. Inf. Model. 2014, 54, 3259–3267.

86.

Hopkins

A. L.

Keseru

G. M.

Leeson

P. D.

, et al. The Role of Ligand Efficiency Metrics in Drug Discovery. Nat. Rev. Drug Disc. 2014, 13, 105–121.

87.

Hopkins

A. L.

Groom

C. R.

Alex

Ligand Efficiency: A Useful Metric for Lead Selection. Drug Disc. Today. 2004, 9, 430–431.

88.

Johnson

T. W.

Gallego

R. A.

Edwards

M. P.

Lipophilic Efficiency as an Important Metric in Drug Design. J. Med. Chem. 2018, 61, 6401–6420.

89.

Hansch

Bjorkroth

J. P.

Leo

Hydrophobicity and Central Nervous System Agents: On the Principle of Minimal Hydrophobicity in Drug Design. J. Pharm. Sci. 1987, 76, 663–687.

90.

Scott

J. S.

Waring

M. J.

Practical Application of Ligand Efficiency Metrics in Lead Optimisation. Bioorg. Med. Chem. 2018, 26, 3006–3015.

91.

Tarcsay

Nyiri

Keseru

G. M.

Impact of Lipophilic Efficiency on Compound Quality. J. Med. Chem. 2012, 55, 1252–1260.

92.

Valko

Reynolds

High-Throughput Physicochemical and In Vitro ADMET Screening: A Role in Pharmaceutical Profiling. Am. J. Drug Disc. 2005, 3, 83–100.

93.

Cumming

J. G.

Davis

A. M.

Muresan

, et al. Chemical Predictive Modelling to Improve Compound Quality. Nat. Rev. Drug Disc. 2013, 12, 948.

94.

Delaney

J. S.

Predicting Aqueous Solubility from Structure. Drug Disc. Today. 2005, 10, 289–295.

95.

Tetko

I. V.

Sushko

Novotarskyi

, et al. How Accurately Can We Predict the Melting Points of Drug-Like Compounds? J. Chem. Inf. Model. 2014, 54, 3320–3329.

96.

Jain

Yalkowsky

S. H.

Estimation of the Aqueous Solubility I: Application to Organic Nonelectrolytes. J. Pharm. Sci. 2001, 90, 234–252.

97.

Bayliss

M. K.

Butler

Feldman

P. L.

, et al. Quality Guidelines for Oral Drug Candidates: Dose, Solubility and Lipophilicity. Drug. Disc. Today. 2016, 21, 1719–1727.

98.

Baumann

Cross-Validation Is Dead: Long Live Cross-Validation! Model Validation Based on Resampling. J. Cheminform. 2010, 2, O5.

99.

Tropsha

Gramatica

Gombar

V. K.

The Importance of Being Earnest: Validation Is the Absolute Essential for Successful Application and Interpretation of QSPR Models. QSAR Comb. Sci. 2003, 22, 69–77.

100.

Box

G. E. P.

Robustness in the Strategy of Scientific Model Building. In Robustness in Statistics; Launer

R. L.

Wilkinson

G. N.

, Eds. Academic Press: London, 1979, pp 201–236.

101.

Cox

Green

D. V. S.

Luscombe

C. N.

, et al. QSAR Workbench: Automating QSAR Modeling to Drive Compound Design. J. Comput. Aided Mol. Des. 2013, 27, 321–336.

102.

Cartmell

Enoch

Krstajic

, et al. Automated QSPR through Competitive Workflow. J. Comput. Aided Mol. Des. 2005, 19, 821–833.

103.

Cherkasov

Muratov

E. N.

Fourches

, et al. QSAR Modeling: Where Have You Been? Where Are You Going To? J. Med. Chem. 2014, 57, 4977–5010.

104.

van de Waterbeemd

Gifford

ADMET In Silico Modelling: Towards Prediction Paradise?

Nat. Rev. Drug Disc. 2003, 2, 192.

105.

Bergström

C. A. S.

Wassvik

C. M.

Johansson

, et al. Poorly Soluble Marketed Drugs Display Solvation Limited Solubility. J. Med. Chem. 2007, 50, 5858–5862.

106.

Morgan

Brown

D. G.

Lennard

, et al. Impact of a Five-Dimensional Framework on R&D Productivity at AstraZeneca. Nat. Rev. Drug Disc. 2018, 17, 167–181.

107.

Lombardo

Desai

P. V.

Arimoto

, et al. In Silico Absorption, Distribution, Metabolism, Excretion, and Pharmacokinetics (ADME-PK): Utility and Best Practices. An Industry Perspective from the International Consortium for Innovation through Quality in Pharmaceutical Development. J. Med. Chem. 2017, 60, 9097–9113.

108.

Docci

Parrott

Krähenbühl

, et al. Application of New Cellular and Microphysiological Systems to Drug Metabolism Optimization and Their Positioning Respective to In Silico Tools. SLAS Disc. 2019, 24, 523–536.

109.

Morgan

Van Der Graaf

P. H.

Arrowsmith

, et al. Can the Flow of Medicines Be Improved? Fundamental Pharmacokinetic and Pharmacological Principles toward Improving Phase II Survival. Drug Disc. Today. 2012, 17, 419–424.