Sage Journals: Discover world-class research

Abstract

Computational protein design (CPD) algorithms that compute binding affinity, K_a , search for sequences with an energetically favorable free energy of binding. Recent work shows that three principles improve the biological accuracy of CPD: ensemble-based design, continuous flexibility of backbone and side-chain conformations, and provable guarantees of accuracy with respect to the input. However, previous methods that use all three design principles are single-sequence (SS) algorithms, which are very costly: linear in the number of sequences and thus exponential in the number of simultaneously mutable residues. To address this computational challenge, we introduce BBK*, a new CPD algorithm whose key innovation is the multisequence (MS) bound: BBK* efficiently computes a single provable upper bound to approximate K_a for a combinatorial number of sequences, and avoids SS computation for all provably suboptimal sequences. Thus, to our knowledge, BBK* is the first provable, ensemble-based CPD algorithm to run in time sublinear in the number of sequences. Computational experiments on 204 protein design problems show that BBK* finds the tightest binding sequences while approximating K_a for up to 10⁵-fold fewer sequences than the previous state-of-the-art algorithms, which require exhaustive enumeration of sequences. Furthermore, for 51 protein–ligand design problems, BBK* provably approximates K_a up to 1982-fold faster than the previous state-of-the-art iMinDEE/ / algorithm. Therefore, BBK* not only accelerates protein designs that are possible with previous provable algorithms, but also efficiently performs designs that are too large for previous methods.

Get full access to this article

View all access options for this article.

References

Boas

F.E.

, and Harbury

P.B.

2007. Potential energy functions for protein design. Curr. Opin. Struct. Biol. 17, 199–204.

Carmen

, and Jermutus

2002. Concepts in antibody phage display. Brief Funct. Genomic Proteomic. 1, 189–203.

Chen

C.-Y.

, Georgiev

, Anderson

A.C.

, et al. 2009. Computational structure-based redesign of enzyme activity. Proc. Natl Acad. Sci. U. S. A. 106, 3764–3769.

Dahiyat

B.I.

, and Mayo

S.L.

1996. Protein design automation. Protein Sci. 5, 895–903.

Desmet

, De Maeyer

, Hazes

, et al. 1992. The dead-end elimination theorem and its use in protein side-chain positioning. Nature, 356, 539–542.

Donald

B.R.

2011. Algorithms in Structural Molecular Biology. MIT Press, Cambridge, MA.

Fleishman

S.J.

, Khare

S.D.

, Koga

, et al. 2011. Restricted sidechain plasticity in the structures of native proteins and complexes. Protein Sci. 20, 753–757.

Frey

K.M.

, Georgiev

, Donald

B.R.

, et al. 2010. Predicting resistance mutations using protein design algorithms. Proc. Natl Acad. Sci. U. S. A. 107, 13707–13712.

Fromer

, and Yanover

2008. A computational framework to empower probabilistic protein design. Bioinformatics, 24, i214–i222.

10.

Gainza

, Nisonoff

H.M.

, and Donald

B.R.

2016. Algorithms for protein design. Curr. Opin. Struct. Biol. 39, 16–26.

11.

Gainza

, Roberts

K.E.

, and Donald

B.R.

2012. Protein design using continuous rotamers. PLoS Comput. Biol. 8, e1002335.

12.

Gainza

, Roberts

K.E.

, Georgiev

, et al. 2013. Program, user manual, and source code are available at www.cs.duke.edu/donaldlab/software.php. OSPREY: Protein design with ensembles, flexibility, and provable algorithms. Methods Enzymol. 523, 87–107.

13.

Georgiev

, and Donald

B.R.

2007. Dead-end elimination with backbone flexibility. Bioinformatics, 23, i185–i194.

14.

Georgiev

, Lilien

R.H.

, and Donald

B.R.

2006. Improved pruning algorithms and divide-and-conquer strategies for dead-end elimination, with application to protein design. Bioinformatics, 22, e174–e183.

15.

Georgiev

, Lilien

R.H.

, and Donald

B.R.

2008. The minimized dead-end elimination criterion and its application to protein redesign in a hybrid scoring and search algorithm for computing partition functions over molecular ensembles. J. Comput. Chem. 29, 1527–1542.

16.

Georgiev

, Schmidt

, Li

, et al. 2012. Design of epitope-specific probes for sera analysis and antibody isolation. Retrovirology, 9, P50.

17.

Georgiev

I.S.

2009. Novel algorithms for computational protein design, with applications to enzyme redesign and small-molecule inhibitor design [Ph.D. thesis]. Duke University, Durham, NC. Retrieved from http://hdl.handle.net/10161/1113.

18.

Georgiev

I.S.

, Rudicell

R.S.

, Saunders

K.O.

, et al. 2014. Antibodies VRC01 and 10E8 neutralize HIV-1 with high breadth and potency even with IG-framework regions substantially reverted to germline. J. Immunol. 192, 1100–1106.

19.

Gilson

M.K.

, Given

J.A.

, Bush

B.L.

, et al. 1997. The statistical-thermodynamic basis for computation of binding affinities: A critical review. Biophys. J. 72, 1047–1069.

20.

Gorczynski

M.J.

, Grembecka

, Zhou

, et al. 2007. Allosteric inhibition of the protein-protein interaction between the leukemia-associated proteins Runx1 and CBFbeta. Chem. Biol. 14, 1186–1197.

21.

Hallen

M.A.

, and Donald

B.R.

2016. Comets (Constrained optimization of multistate energies by tree search): A provable and efficient protein design algorithm to optimize binding affinity and specificity with respect to sequence. J. Comput. Biol. 23, 311–321.

22.

Hallen

M.A.

, Gainza

, and Donald

B.R.

2015. Compact representation of continuous energy surfaces for more efficient protein design. J. Chem. Theory Comput. 11, 2292–2306.

23.

Hallen

M.A.

, Jou

J.D.

, and Donald

B.R.

2017. LUTE (Local unpruned tuple expansion): Accurate continuously flexible protein design with general energy functions and rigid Rotamer-Like efficiency. J. Comput. Biol. 24(6), 536–546.

24.

Hallen

M.A.

, Keedy

D.A.

, and Donald

B.R.

2013. Dead-end elimination with perturbations (DEEPer): A provable protein design algorithm with continuous sidechain and backbone flexibility. Proteins, 81, 18–39.

25.

Hart

, N.J., N., and Raphael

1968. A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. SSC. 4, 100–114.

26.

Jou

J.D.

, Jain

, Georgiev

I.S.

, et al. 2016. BWM*: A novel, provable, ensemble-based dynamic programming algorithm for sparse approximations of computational protein design. J. Comput. Biol. 23, 413–424.

27.

Kingsford

C.L.

, Chazelle

, and Singh

2005. Solving and analyzing side-chain positioning problems using linear and integer programming. Bioinformatics, 21, 1028–1039.

28.

Kuhlman

, and Baker

2000. Native protein sequences are close to optimal for their structures. Proc. Natl Acad. Sci. U. S. A. 97, 10383–10388.

29.

Leach

A.R.

, and Lemon

A.P.

1998. Exploring the conformational space of protein side chains using dead-end elimination and the a* algorithm. Proteins, 33, 227–239.

30.

Leaver-Fay

, Tyka

, Lewis

S.M.

, et al. 2011. Rosetta3: An object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol. 487, 545–574.

31.

Lee

, and Levitt

1991. Accurate prediction of the stability and activity effects of site-directed mutagenesis on a protein core. Nature, 352, 448–451.

32.

Leech

, Prins

J.F.

, and Hermans

1996. Smd: Visual steering of molecular dynamics for protein design. Comput. Sci. Eng. 3, 38–45.

33.

Lilien

R.H.

, Stevens

B.W.

, Anderson

A.C.

, et al. 2005. A novel ensemble-based scoring and search algorithm for protein redesign and its application to modify the substrate specificity of the gramicidin synthetase a phenylalanine adenylation enzyme. J. Comput. Biol. 12, 740–761.

34.

Lovell

S.C.

, Word

J.M.

, Richardson

J.S.

, et al. 2000. The penultimate rotamer library. Proteins, 40, 389–408.

35.

Lower

S.K.

, Lamlertthon

, Casillas-Ituarte

N.N.

, et al. 2011. Polymorphisms in fibronectin binding protein A of Staphylococcus aureus are associated with infection of cardiovascular devices. Proc. Natl. Acad. Sci. U. S. A. 108, 18372–18377.

36.

Nisonoff

2015. Efficient partition function estimation in computational protein design: Probabalistic guarantees and characterization of a novel algorithm [B.S. Thesis]. Department of Mathematics, Duke University.

37.

Ojewole

, Lowegard

, Gainza

, et al. 2017a. OSPREY predicts resistance mutations using positive and negative computational protein design. Methods Mol. Biol. 1529, 291–306.

38.

Ojewole

A.A.

, Jou

J.D.

, Fowler

V.G.

, et al. 2017b. Supplementary information: BBK* (Branch and Bound over K*): A provable and efficient ensemble-based protein design algorithm to optimize stability and binding affinity over large sequence spaces for sparse approximations of computational protein design. Available at: www.cs.duke.edu/donaldlab/Supplementary/jcb17/bbkstar. Last accessed on February 23, 2018.

39.

Pál

, Kouadio

J.-L.K.

, Artis

D.R.

, et al. 2006. Comprehensive and quantitative mapping of energy landscapes for protein-protein interactions by rapid combinatorial scanning. J. Biol. Chem. 281, 22378–22385.

40.

Peng

, Hosur

, Berger

, et al. 2015. iTreePack: Protein complex Side-Chain packing by dual decomposition. arXiv:1504.05467 [q-bio.BM].

41.

Pierce

N.A.

, and Winfree

2002. Protein design is NP-hard. Protein Eng. 15, 779–782.

42.

Reeve

S.M.

, Gainza

, Frey

K.M.

, et al. 2015. Protein design algorithms predict viable resistance to an experimental antifolate. Proc. Natl Acad. Sci. U. S. A. 112, 749–754.

43.

Roberts

K.E.

, Cushing

P.R.

, Boisguerin

, et al. 2012. Computational design of a PDZ domain peptide inhibitor that rescues CFTR activity. PLoS Comput. Biol. 8, e1002477.

44.

Roberts

K.E.

, and Donald

B. R.

2015. Improved energy bound accuracy enhances the efficiency of continuous protein design. Proteins, 83, 1151–1164.

45.

Roberts

K.E.

, Gainza

, Hallen

M.A.

, et al. 2015. Fast gap-free enumeration of conformations and sequences for protein design. Proteins, 83, 1859–1877.

46.

Rudicell

R.S.

, Kwon

Y.D.

, Ko

S.-Y.

, et al. 2014. Enhanced potency of a broadly neutralizing HIV-1 antibody in vitro improves protection against lentiviral infection in vivo. J Virol, 88, 12669–12682.

47.

Sciretti

, Bruscolini

, Pelizzola

, et al. 2009. Computational protein design with side-chain conformational entropy. Proteins, 74, 176–191.

48.

Silver

N.W.

, King

B.M.

, Nalam

M.N.L.

, et al. 2013. Efficient computation of small-molecule configurational binding entropy and free energy changes by ensemble enumeration. J. Chem. Theory Comput. 9, 5098–5115.

49.

Simoncini

, Allouche

, de Givry

, et al. 2015. Guaranteed discrete energy optimization on large protein design problems. J. Chem. Theory Comput. 11, 5980–5989.

50.

Stevens

B.W.

, Lilien

R.H.

, Georgiev

, et al. 2006. Redesigning the PheA domain of gramicidin synthetase leads to a new understanding of the enzyme's mechanism and selectivity. Biochemistry, 45, 15495–15504.

51.

Traoré

, Allouche

, André

, et al. 2013. A new framework for computational protein design through cost function network optimization. Bioinformatics, 29, 2129–2136.

52.

Traoré

, Roberts

K.E.

, Allouche

, et al. 2016. Fast search algorithms for computational protein design. J. Comput. Chem. 37, 1048–1058.

53.

Valiant

L.G.

1979. The complexity of computing the permanent. Theor. Comput. Sci. 8, 189–201.

54.

Viricel

, Simoncini

, Barbe

, Schiex

2016. Guaranteed Weighted Counting for Affinity Computation: Beyond Determinism and Structure. In: Rueher

(eds) Principles and Practice of Constraint Programming. CP 2016. Lecture Notes in Computer Science, vol 9892. Springer, Cham. DOI https://doi.org/10.1007/978-3-319-44953- 1_46.

55.

Wainwright

M.J.

, Jaakkola

T.S.

, and Willsky

A.S.

2013. A new class of upper bounds on the log partition function. CoRR abs/1301.0610.

56.

2005. Rapid protein side-chain packing via tree decomposition. 9th Annual International Conference, RECOMB, 3500, 423–439.

57.

, and Berger

2006. Fast and accurate algorithms for protein side-chain packing. J. ACM, 53, 533–557.

58.

Zheng

, Yang

, Ko

M.-C.

, et al. 2008. Most efficient cocaine hydrolase designed by virtual screening of transition states. J. Am. Chem. Soc. 130, 12148–12155.

59.

Zhou

, and Grigoryan

2015. Rapid search for tertiary fragments reveals protein sequence-structure relationships. Protein Sci. 24, 508–524.

BBK* (Branch and Bound Over K*): A Provable and Efficient Ensemble-Based Protein Design Algorithm to Optimize Stability and Binding Affinity Over Large Sequence Spaces

Abstract

Abstract

Get full access to this article

References