Protein dynamics are essential for the interactions and functions of proteins1,2. Functionally important protein motions range from fast, small-scale fluctuations (sub-nanosecond) to slow (microsecond and above), global changes, like loop rearrangements or domain reorganizations. Internal protein motions are easy to relate with the biochemical functions of proteins if one considers the basic principles of allostery, or the folding upon binding mechanism described in certain protein interactions. Intrinsically disordered regions of proteins (IDRs) are probably the most representative examples of the key role of dynamics in protein function.
Intrinsically disordered proteins (IDPs)
IDPs are a newly discovered class of proteins that challenge the long-standing structure-function paradigm of molecular biology3. They function as an ensemble of conformations and have no consistent three-dimensional structure3,4, yet they fulfill essential roles in many biological processes, especially in those involving regulatory functions5. The amino acid residues of IDPs will sample many different conformations, however, they do not behave as theoretical random coils, so they can prefer certain conformations to others as determined by their sequence context6. Protein disorder is therefore related to dynamics, but its identification and interpretation still poses a significant challenge. This happens mainly due to the fact that the amount of experimentally validated disordered protein regions7 is far behind the theoretically predicted ones8,9 and hence they might not comprise a representative sample that is well-suited for the training of disorder prediction methods.
DynaMine is a fast, sequence-based predictor of protein backbone dynamics. It has been trained on residue-level fast protein backbone movements, namely S2 order parameters10, that have been estimated from a large set (210880 residues of 1952 proteins) of carefully curated NMR chemical shifts of the BMRB database11 using the RCI method12 (Figure 1). It is based on the linear regression algorithm of Weka 3.6.913 with default parameters, which takes into consideration the context of a residue provided by the 25 residues preceding and following it in the protein sequence. DynaMine is able to discriminate between protein regions of different levels of structural organization, such as folded domains and disordered segments of different sizes, with high accuracy. Unlike other existing methods, it identifies disordered regions accurately without depending on prior disorder knowledge or three-dimensional structural information.
Features of the dynamics patterns produced by DynaMine
The other property that definitely distinguishes DynaMine from other existing methods is the fact that the predicted values have a clear physical meaning. The S2 order parameter is related to the rotational angle of the given N-H bond vector of the protein backbone (Figure 2), and DynaMine predicts these values from sequence without any further transformation or rescaling, so the larger the predicted value, the more rigid the backbone. Consequently, domain regions will get higher scores from DynaMine (approximately scores above 0.8) than disordered segments (scores approximately below 0.69), and a kind of twilight zone with context dependent structural organization of polypeptide chains can also be detected (0.69-0.8) (see Examples). Also, the predicted values are not transformed into a 0-1 scale, so, due to the linear nature of the method and the fact that it was trained on soluble proteins, regions with unexpected sequence compositions, like transmembrane regions, can get scores above 1. Despite this behavior, the predictions are not transformed or rescaled to ensure that the resulting values make sense on the absolute scale. Consequently, DynaMine predictions allow for meaningful dynamics comparisons between different predicted residue ranges of the same protein or those of different proteins.
- Dodson, G. & Verma, C. S. Protein flexibility: its role in structure and mechanism revealed by molecular simulations. Cellular and molecular life sciences: CMLS 63, 207-219 (2006).
- Teilum, K., et al. Functional aspects of protein flexibility. Cellular and molecular life sciences: CMLS 66, 2231-2247 (2009).
- Wright, P. E. & Dyson, H. J. Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm. Journal of Molecular Biology 293, 321-331 (1999).
- Tompa, P. Intrinsically unstructured proteins. Trends in biochemical sciences 27, 527-533 (2002).
- Dyson, H. J. & Wright, P. E. Intrinsically unstructured proteins and their functions. Nat Rev Mol Cell Biol 6, 197-208 (2005).
- Schweitzer-Stenner, R. Conformational propensities and residual structures in unfolded peptides and proteins. Molecular bioSystems 8, 122-133 (2012).
- Sickmeier, M. et al. DisProt: the Database of Disordered Proteins. Nucleic Acids Res 35, D786-793 (2007).
- Dunker, A. K. et al. Intrinsic protein disorder in complete genomes. Genome Inform Ser Workshop Genome Inform 11, 161-171 (2000).
- Romero, P. et al. Thousands of proteins likely to have long disordered regions. Pac Symp Biocomput, 437-448 (1998).
- Dinkel, H. et al. ELM–the database of eukaryotic linear motifs. Nucleic Acids Res 40, D242-251 (2012).
- Ulrich, E. et al. BioMagResBank. Nucleic Acids Res 36, D402-408 (2008).
- Berjanskii, M. V. & Wishart, D. S. The RCI server: rapid and accurate calculation of protein flexibility using chemical shifts. Nucleic Acids Res 35, W531-537 (2007).
- Mark Hall, E. F. et al.. The WEKA Data Mining Software: An Update. In SIGKDD Explorations Vol. 11 (2009).