

Input features used in BAP include very popular position specific scoring matrices (PSSM) generated by PSI-BLAST 7 physicochemical properties (7PCP) such as steric parameter (graph shape index), hydrophobicity, volume, polarisability, isoelectric point, helix probability, and sheet probability predicted accessible surface area (ASA) hidden Markov model (HMM) profiles produced by HHBlits contact maps and PSP19. In BAP, DNN variants such as stacked sparse auto-encoder neural networks, long short-term memory (LSTM) bidirectional recurrent neural networks (BRNNs), Residual Networks (ResNets), and DNN ensembles or layered iterations have been used. Yet more accurate BAP is needed since errors in any angles in a protein has a cascaded effect on the entire protein structure. Protein backbone angle prediction (BAP) has achieved significant progress with the development of DNNs. In this work, we develop deep neural network (DNN) models to predict the backbone angles \(\phi\), \(\psi\), \(\theta\), and \(\tau\) for proteins.

Since multiple residues are needed to define \(\theta\) and \(\tau\), they could somewhat capture local structures. AAs all have three common atoms N, \(C^\) atoms. Proteins have backbones or main chains comprising peptide bonds that connect C and N atoms of successive AAs. The challenge comes from the astronomically large conformational search space and the unknown energy function involved in the folding process. The protein structure prediction (PSP) problem is to determine the native structure of a protein from its AA sequence. The native structure of a protein has the minimum free energy and it determines the function of the protein. Proteins comprise amino acid (AA) sequences and fold into three dimensional (3D) structures. SAP4SS along with its data is available from. Consequently, SAP4SS significantly outperforms existing state-of-the-art methods SAP, OPUS-TASS, and SPOT-1D: the differences in MAE for all four types of angles are from 1.5 to 4.1% compared to the best known results. The new method named SAP4SS obtains mean absolute error (MAE) values of 15.59, 18.87, 6.03, and 21.71 respectively for four types of backbone angles \(\phi\), \(\psi\), \(\theta\), and \(\tau\). This is to compensate the loss of generalisation by exploiting specialisation knowledge in an informed way.

In this work, we explicitly exploit classification knowledge to restrict generalisation within the specific class of training examples. Machine learning methods strive to achieve generality over the training examples and consequently loose accuracy. In this paper, we propose to train separate deep learning models for each category of secondary structures. Usually the same deep learning model is used in making prediction for all residues regardless of the categories of secondary structures they belong to. Protein backbone angle prediction has achieved significant accuracy improvement with the development of deep learning methods.
