Abstract
The programmed cell death protein 1 (PD-1, CD279) is an important therapeutic target in many oncological diseases. This checkpoint protein inhibits T lymphocytes from attacking other cells in the body and thus blocking it improves the clearance of tumor cells by the immune system. While there are already multiple FDA-approved anti-PD-1 antibodies, including nivolumab (Opdivo® from Bristol-Myers Squibb) and pembrolizumab (Keytruda® from Merck), there are ongoing efforts to discover new and improved checkpoint inhibitor therapeutics. In this study, we present multiple anti-PD-1 antibody fragments that were derived computationally using protein diffusion and evaluated through our scalable, in silico pipeline. Here we present nine synthetic Fv structures that are suitable for further empirical testing of their anti-PD-1 activity due to desirable predicted binding performance.
Introduction
The field of cancer immunotherapy has witnessed unprecedented advancements over the past few decades, reshaping the landscape of oncological treatment strategies and extending the lives of those with disease. Among the notable breakthroughs is the identification and targeting of programmed cell death protein 1 (PD-1), a crucial immune checkpoint receptor that plays a pivotal role in modulating T-cell responses. PD-1 belongs to the CD28 superfamily and is predominantly expressed on activated T cells. 1 PD-L1 and PD-L2, the ligands of PD-1, are highly expressed on multiple types of cancer cells and thus plays an important role in immune evasion.2,3
PD-1 emerged as a key focus in cancer research due to its ability to suppress the immune system and facilitate immune evasion by tumor cells. This discovery has strengthened the development need of innovative therapeutic interventions designed to support the immune system's full potential against cancer. 4
Pembrolizumab (marketed as Keytruda® by Merck) and nivolumab (marketed as Opdivo® by Bristol Myers Squibb), two of the first monoclonal antibodies targeting PD-1, first gained FDA approval for the treatment of melanoma in 2014.5,6 Both, along with other antibodies, have since been extended approval to various other malignancies including non-small cell lung cancer, head and neck squamous cell carcinoma, and Hodgkin's lymphoma, colorectal cancer, renal cell carcinoma, and others.7–11
Pembrolizumab and other similar antibodies have demonstrated remarkable efficacy across a variety of cancers, offering durable responses and improved survival rates.12,13
Despite the success of PD-1-targeted therapies, challenges persist, including treatment resistance, 14 variability in patient response,15,16 and the need for personalized approaches. 17 Herein lies the motivation for exploring novel strategies in the design and optimization of PD-1-targeting antibodies.
This study explores the use of artificial intelligence (AI) in the design of antibodies through protein diffusion. Protein diffusion is a new technique that uses deep learning-based models to generate amino acids sequences. This can be performed “unconditionally”, where the AI model generates a random sequence of a desired length, or “conditionally”, where there system has some reference data and will produce proteins that mimic a given input protein class. We can then fold the diffused amino acid sequences to produce protein structure files to be used in subsequent analyses.
Using a large corpus of existing anti-PD-1 antibodies to conditionally guide the AI system, we have generated 9 antibody candidates and assessed their viability as compared to other therapeutics through in silico protein-protein docking.
By leveraging the power of protein diffusion, we aim to contribute to the evolving landscape of cancer immunotherapy, offering insights that may lead to the development of next-generation PD-1-targeted antibodies. This approach holds promise for addressing current development bottlenecks and advancing the field towards more effective and personalized treatments for cancer patients. In the following sections, we delve into the methodology and potential implications of utilizing protein diffusion in the design of PD-1-targeting antibodies, aiming to contribute to the ongoing efforts in the pursuit of precision oncology.
Methods
Heavy and light chain sequences of 33 PD-1 targeting antibodies (Fv region only) were retrieved from the Therapeutic Structural Antibody Database (Thera-SAbDab) 18 on November 11, 2023. The sets of heavy and light chain sequences were each aligned using Muscle v3.8.425, 19 producing .fasta files of the alignments. These were then converted to the .a3m format.
EvoDiff, a suite of protein generation models from Microsoft Research, 20 was used for diffusion of new antibody structures that target PD-1. This generative framework uses an input of aligned amino acid sequences for conditional diffusion where the diffusion process is “evolutionarily-guided” through predictions based on the input set.
Conditional diffusion was carried out using the .a3 m alignment files with EvoDiff's MSA_OA_DM_MAXSUB model using the generate_query_oadm_msa_simple() function. Three heavy chain Fv sequences and three light chain Fv sequences were diffused through this process, as shown in Table 1. Then, these were combined to generate 9 antibody candidates, as listed in Table 2.
Conditionally-Diffused Sequences of the PD-1-Targeting Antibody Candidate Heavy and Light Chains. CDR Loop Residues are
Heavy and Light Chain Assignments of the PD-1-Targeting Antibody Candidates.
This process generates a .pdb file for each of the Fv candidate structures. The L chain in each of these .pdb files was renumbered using PyMol v2.4.1 27 to avoid duplicative residue numbering with the H chain, which is a requirement for the docking process.
For this study, we utilized a Docker containerized version of HADDOCK,12 which contains all of the software dependencies to allow HADDOCK to run more readily in an high-performance computing (HPC) environment. HADDOCK was run on 36 physical cores in the University of North Carolina at Charlotte HPC cluster.
To prepare for protein docking, active residues must be determined for both the antibody and antigen in each experiment. For the PD-1 antigen, the active residues were determined to be those as the interfacing site between PD-1 and PD-L1. Namely, ASN66, THR76, LYS78, and surrounding residues. This is the common binding site for many of the therapeutic antibodies, including pembrolizumab.
As for the Fv structures, active residues were deemed to be the residues in the CDR loops. Residues in the CDR loops were programmatically detected using the ANARCI system. 29 This process returns the residues numbers, based on the Chothia numbering system, for the CDR1, CDR2, and CDR3 loops.
The HADDOCK experiment files were programmatically generated using custom Python logic, which created directories for each of the antibodies and placed the required files in each directory. Then, each experiment was submitted to the HPC cluster to be run in parallel across the distributed compute nodes. To complete the 42 docking experiments, this took approximately 3 h on ten 36-core nodes. This scalable docking process closely follows methods reported in Tomezsko and Ford et al., 2023 30 and the published antibody docking protocol from the Bonvin Lab at Utrecht University.28,31,32
Also, PRODIGY, a tool to predict the binding affinity of protein-protein complexes, was used on each “best” structure for each complex in this study. 33 The predicted binding affinities are reported as Gibbs energy, shown as ΔG (in Kcal/mol units).
Protein structures and complexes resulting from the diffusion/Thera-SAbDab procurement process and the HADDOCK docking processes, respectively, were visualized using PyMOL v2.4.1. 27 PyMOL was also used to help select active residues at the PD-1/PD-L1 interface from PDB: 3BIK and to evaluate the antibody active residues as selected by abYsis and manual selection. Then, interfacing residues were detected (polar contacts within 3.0 Å) between the PD-1 and Fv complexes that may indicate potential inhibition activity.
The results of these analyses, including the solvated and energy minimized PDB files and equilibrated metrics, are provided in the GitHub repository.
Results
Through protein diffusion, 9 antibody candidates were generated that bind similarly to other existing therapeutic antibodies. As the binding site on PD-1 was constrained to be the normal interface between PD-1 and PD-L1, which is also the binding site for other commercially available antibodies like pembrolizumab and nivolumab, the docking results show consistent interactions in this area. However, the binding orientation and angles of these diffused antibodies against PD-1 vary. These docking results are shown in Figure 1.

Predicted docking complexes of nine conditionally-diffused antibodies. The grey surface protein is PD-1 and the cartoon structures are the Fv portions of the antibodies. The docking location of pembrolizumab is shown in teal (predicted) and pink (actual, PDB: 5JXE) and the contested docking location of nivolumab is shown in orange (predicted) and light purple/purple (actual, PDBs: 5GGR and 5WT9, respectively).
As shown in Figure 2, some of the 9 diffused candidates bound to PD-1 with similar affinities, though not in all cases nor across all metrics.

Comparison of the docking metrics between existing and diffused antibodies. Pairwise comparisons are shown as p-values from the Wilcoxon signed-rank test. For all metrics except buried surface area, lower is likely indicative of better binding.
Container GitHub Repository: https://github.com/colbyford/HADDOCKer Docker Hub Images: https://hub.docker.com/r/cford38/haddock
anti-parallel beta sheet between complementarity-determining region 3 (CDR3) and frame region 4 (FR4) as compared to other light chains in the reference set. It is unknown as to the effect of these secondary structure differences, though the confidence in the protein folding prediction remains high in all areas except for the CDR loops, which is expected.
Of note, candidates TUPPD1-001, TUPPD1-002, and TUPPD1-009 showed the most promise as their binding metrics were the strongest across multiple biochemical features. Each of these candidates exhibit favorable predicted van der Waals, electrostatic, and Gibbs energies as compared to the larger Thera-SAbDab reference set. See Table 3.
Comparison of the Three top Performing Diffused Antibodies Against Nivolumab and Pembrolizumab, Along with Group Averages.
Furthermore, molecular dynamics simulations show that the predicted HADDOCK output structures were in a nearly optimal energy minimized state. For example, when aligning the predicted complexes from HADDOCK versus the energy minimized versions from OpenMM, the RMSD of the alignments were very low, suggesting considerable stability and confidence in the best cluster of docked complexes.
RMSD of the antibody/PD-1 complexes (aligned before and after energy minimization in molecular dynamics simulations):
TUPPD1-001: 1.034Å TUPPD1-002: 0.970Å TUPPD1-009: 1.072Å Pembrolizumab: 0.906Å Nivolumab: 0.911Å

Predicted docking interfaces of three conditionally-diffused antibodies, pembrolizumab, and nivolumab. The grey cartoon protein is PD-1 (with polar contacts shown in red) and the blue cartoon structures are the Fv portions of the diffused antibodies. The Fv structures of pembrolizumab and nivolumab are shown in teal and orange, respectively.
Discussion
Protein diffusion is an exciting and new technology. Built on the progress in large language models (LLMs) for general artificial intelligence, these models have potential to revolutionize the drug development process and reducing the initial lab-based development workload in favor of in silico exploration.38,39 State-of-the-art protein generation models like EvoDiff from Microsoft Research 20 (used in this study), RFdiffusion 40 from the Baker Lab at the University of Washington, and Chroma 41 from Generate:Biomedicines have shown real promise in AI-based drug design. These various models and frameworks offer a variety of method for protein diffusion. For example, EvoDiff conditionally generates sequences that fold into the expected shape (such as antibody chains) through a simple alignment input. Other models, like Chroma generate proteins based on their potential in a given complex (such as an antibody binding to PD-1). In future studies, we will explore the use of various diffusion models, weighing their respective capabilities for a larger set of diffusion-based in silico experiments. Recently, we have seen clinical trials of computationally-derived antibodies such as Aulos Bioscience's AU-007 antibody targeting IL-2 in patients with unresectable locally advanced or metastatic cancer. 42 Thus, the potential of in silico-based therapeutics in vivo is certainly nascent.
Conclusion
For anti-PD-1 therapeutics, here we have shown the utility of conditional diffusion in the guidance of amino acid sequence generation that mimics the biophysical features of other available antibodies that target PD-1.
The results presented here are limited in that they are in silico-based predictions. Additional testing will be performed in the future to empirically validate these findings. This will include antibody synthesis, testing on recombinant cell lines, and other lab-based assessments of binding affinity.
Furthermore, the diffusion process presented here only includes the creation of the Fv portion of the candidate structures. Thus, additional work is left to be performed to complete the full antibody structure, including the combination with the rest of the immunoglobulin G structure, any humanization or efforts to reduce immunogenicity, etc Since it appears that some of the biochemical features of these diffused sequences may be unusual as compared to naturally-derived antibodies, it is still unknown how this will affect the viability of the candidates, both from a therapeutic perspective and from a manufacturability perspective. For example, from the abYsis analyses, we see that various residues are uncommon in certain positions, but we do not yet understand how those residues may impact the an antibody's potential (positively or negatively) to be developed into a safe and effective drug.
Previous studies by our team30,43–45 and others46–48 have shown the utility of large-scale computational screens for understanding the interaction of antibodies (or other immunoproteins) with protein targets as well as the identification of therapeutic targets of interest. Combining this capability with protein diffusion, we can increase the throughput of computational drug design through high-performance computing and automated complex evaluation, as we’ve demonstrated in this preliminary work.
Data Availability Statement
All code, data, results, and additional analyses are openly available on GitHub at: https://github.com/tuplexyz/PD-1_Fab_Diffusion.
This repository includes the PDB files for the all 42 antibody Fv structures (33 from Thera-SAbDab and the 9 diffusion-based structures), the Fv-PD-1 complexes from HADDOCK, all output metrics, experiment generation and data preparation logic, HPC submission scripts, and the code for post-processing analyses and generating figures.
Footnotes
Disclosures
AU-007 is a clinical antibody candidate owned by Aulos Bioscience.
Declaration of Conflicting Interests
The author declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Author CTF is the owner of Tuple, LLC, a biotechnology consulting firm.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
