Carbohydrate-protein recognition is of great importance to many biological processes. For example: glucose transport and metabolism is fundamental to energy production in all major domains of life; proteins and lipids are often modified by glycosylation, which can influence their localisation, interactions and function; specific carbohydrate modifications on proteins and lipids can influence allergy and rejection responses by the immune system, or conversely, could be exploited immunotherapeutically by inducing an immune response; specific carbohydrates may be utilised by viruses to gain entry into cells.
Although the development of therapeutics targeting or exploiting carbohydrate-protein recognition could be accelerated using structure-based techniques, accurately predicting carbohydrate-binding modes and carbohydrate-binding sites on proteins is challenging for both computational and experimental methods. These issues are mainly associated with the high flexibility of carbohydrates and large number of potential hydrogen bonding arrangments available to carbohydrate-protein interactions. Knowledge-based potentials show great performance in predicting drug-protein and protein-protein interactions, and may therefore be valuable for predicting carbohydrate-protein interactions; the body of solved carbohydrate-protein structural complexes is presently sufficient that such potentials could be derived to meaningfully investigate a wide variety of carbohydrate-protein interactions.
In this study, knowledge-based scoring functions for a wide variety of carbohydrates have been determined via surveying of carbohydrate-protein structural complexes, and validated against unbound protein crystal structures. By deriving these functions, trends in carbohydrate protein recognition were also observed, including: a reduced frequency of hydrogen bonding in non-terminal vs. terminal residues, frequent utilisation of the protein backbone for carbohydrate binding, and aromatic residues as the major mediators of non-polar interactions with carbohydrates. The functions generated in this study are anticipated to be valuable for carbohydrate-protein docking, carbohydrate binding site discovery on proteins, and the design of optimally carbohydrate-binding proteins.