Bio2Byte Tools

Multiple sequence alignment (MSA) tutorial - Biophysical conservation

The selection and plot below are almost the same as the single sequence predictions, refer to the single sequence tutorial for the main functionality. Hovering over the plot, however, will display the Gaussian Mixture Model (GMM) score for that residue. This score is based on an analysis of the 7-dimensional 'biophysical space' for that residue, in relation to the column it occupies in the MSA.

High scores indicate very normal scores, low (negative) scores indicate unusual scores, meaning this residue is unlike other ones in the alignment. Below the plot, you will also find a breakdown of the residues that are indicated by the GMM as unusual, using 5%, 1% and 0.1% cutoffs over the full MSA.

Select protein to display (12 available):

Click on the above prediction names to toggle them on/off

The plot below shows, for the protein you selected above, the variation in predicted biophysical parameters within the multiple sequence alignment (MSA) that you uploaded. This variation is displayed according to simple box plot statistics, with median, first/third quartile, and outlier range of the distributions shown. Columns in the MSA that are 'gapped' for the selected protein are not shown here.

In other words, what is displayed is how the biophysical prediction for each aligned position varies for all the proteins that are in the MSA. You can select the type of prediction that you want to display in the selection box below the plot, and turn each distribution statistic on and off by clicking on its name. The 'prediction' field corresponds to the same type of prediction shown in the top plot.

Select prediction to display:

The selected prediction below this message is visualized in the context of the input MSA. Therefore, there may be gaps where the selected protein lacks values. You can still visualize all the statistical fields for these gaps in the alignment.

If you now select the following characteristics in the dropdown box above, and compare the values for the natural TIM barrel proteins to the de novo designed sTIM-11 protein, and for the misfolding OctaV1 protein (select these at the top of the page):

backbone The backbone dynamics are very similar to the MSA-based distributions for all proteins, with the red line mostly falling within its quartile ranges

earlyFolding The early folding predictions show some immediate differences, with some peaks being higher in sTIM-11 and OctaV1, many similar and one notably absent in OctaV1 (around A138). The early folding differences imply that, compared to natural TIM barrel proteins, some regions of both the sTIM-11 and OctaV1 protein will start to fold earlier, but only few later.

helix The helix propensities for sTIM-11 are similar to the MSA-based distributions, whilst OctaV1 has overall higher propensities.

sheet Both proteins have generally lower beta-sheet propensities, but the pronounced beta-sheet propensity peaks are present in sTIM-11, while some notable ones are absent in OctaV1.

These changes can be connected to each other; for example, the absent early folding peak around A138 in OctaV1 also corresponds to much reduced beta-sheet propensity in OctaV1 (outside of the quartile range), whilst this region is similar to the natural proteins for sTIM-11 (around I128). This might indicate that this region is important for correct folding, and so highlights points where mutations might be explored.

Overall, this type of analysis can highlight differences of interest between the inherent biophysical characteristics encoded by protein sequences, with as only requirement the protein sequences, and a multiple sequence alignment.

DynaMine backbone dynamics	Values above 0.8 indicate rigid conformations, values above 1.0 membrane spanning regions, values below 0.69 flexible regions. Values between 0.69-0.80 are 'context' dependent and capable of being either rigid or flexible.
DynaMine sidechain dynamics	Higher values mean more likely rigid. These values are highly dependent on the amino acid type (i.e. a Trp will be rigid, an Asp flexible).
DynaMine conformational propensities (sheet, helix, coil, ppII (polyproline II))	Higher values indicate higher propensities.
EFoldMine earlyFolding propensity	Values above 0.169 indicate residues that are likely to start the protein folding process, based on only local interactions with other amino acids.
Disomine disorder	Values above 0.5 indicate that this is likely a disordered residue.

Bio2Byte tools

Multiple sequence alignment (MSA) tutorial - Biophysical conservation

Sequence Bs

Sequence Ch

Sequence Ec

Sequence Hu

Sequence Lm

Sequence OctaV1

Sequence Pf

Sequence Tb

Sequence Tm

Sequence Vm

Sequence Ye

Sequence sTIM_11

GMM score analysis

Sequence Bs

GMM score analysis

Sequence Ch

GMM score analysis

Sequence Ec

GMM score analysis

Sequence Hu

GMM score analysis

Sequence Lm

GMM score analysis

Sequence OctaV1

GMM score analysis

Sequence Pf

GMM score analysis

Sequence Tb

GMM score analysis

Sequence Tm

GMM score analysis

Sequence Vm

GMM score analysis

Sequence Ye

GMM score analysis

Sequence sTIM_11

Sequence Bs

Sequence Bs

Sequence Bs

Sequence Bs

Sequence Bs

Sequence Bs

Sequence Bs

Sequence Bs

Sequence Ch

Sequence Ch

Sequence Ch

Sequence Ch

Sequence Ch

Sequence Ch

Sequence Ch

Sequence Ch

Sequence Ec

Sequence Ec

Sequence Ec

Sequence Ec

Sequence Ec

Sequence Ec

Sequence Ec

Sequence Ec

Sequence Hu

Sequence Hu

Sequence Hu

Sequence Hu

Sequence Hu

Sequence Hu

Sequence Hu

Sequence Hu

Sequence Lm

Sequence Lm

Sequence Lm

Sequence Lm

Sequence Lm

Sequence Lm

Sequence Lm

Sequence Lm

Sequence OctaV1

Sequence OctaV1