Abstract
Deep learning has become a powerful paradigm in computational biology, offering data-driven models for mapping protein sequences to structure and function with high precision. In the context of protein structure prediction and functional annotation, traditional sequence-alignment and template-based methods are often limited by low homology and structural diversity. To overcome these challenges, we propose a unified deep learning framework that integrates sequence modeling, structural representation, and functional inference into a single architecture. The system employs transformer-based encoders to extract contextual features from amino acid sequences and graph neural networks to capture spatial dependencies within predicted structures. A multi-task learning approach is designed to perform C α backbone reconstruction and enzyme class prediction simultaneously. The framework leverages joint training on sequence features and predicted inter-residue geometry to improve generalization on rare or multifunctional proteins. Experiments on public benchmark datasets such as CASP14 and CAMEO-Hard demonstrate a 12.4% reduction in backbone RMSD compared to AlphaFold2 and a 9.7% improvement in PR-AUC for contact prediction, validating the effectiveness of joint learning and structural integration. Compared to existing state-of-the-art baselines including RoseTTAFold and trRosetta, our system achieves consistently superior accuracy across structure and function tasks. This work provides a modular, end-to-end solution for large-scale protein analysis and shows potential for extension to downstream tasks such as mutation effect prediction or protein–protein interaction modeling.
Keywords
Get full access to this article
View all access options for this article.
