Machine learning methods for prediction of protein-protein interaction hot spot residues

Sitani, Divya; Carloni, Paolo; Zimmer-Bensch, Geraldine Marion
doi:HT030884506
% IMPORTANT: The following is UTF-8 encoded.  This means that in the presence
% of non-ASCII characters, it will not work with BibTeX 0.99 or older.
% Instead, you should use an up-to-date BibTeX implementation like “bibtex8” or
% “biber”.

@PHDTHESIS{Sitani:989346,
      author       = {Sitani, Divya},
      othercontributors = {Zimmer-Bensch, Geraldine Marion and Carloni, Paolo},
      title        = {{M}achine learning methods for prediction of
                      protein-protein interaction hot spot residues},
      school       = {RWTH Aachen University},
      type         = {Dissertation},
      address      = {Aachen},
      publisher    = {RWTH Aachen University},
      reportid     = {RWTH-2024-06740},
      pages        = {1 Online-Ressource : Illustrationen},
      year         = {2024},
      note         = {Veröffentlicht auf dem Publikationsserver der RWTH Aachen
                      University; Dissertation, RWTH Aachen University, 2024},
      abstract     = {Protein–protein interactions (PPIs) form a vast and
                      intricate network of reactions important for the regulation
                      and execution of most biological processes [Rao+14]. PPIs
                      occur when two proteins make direct physical contact via
                      their surface residues and form an interface, which is a
                      non-uniform surface on a protein-protein complex [GS10].
                      Even though a protein interface may occupy a large area,
                      only a small subset of its buried residues plays a crucial
                      role in the binding free energy of the complex [BT98;
                      Jan95]. These energetically key residues are known as hot
                      spots. The experimental method to identify them is Alanine
                      Scanning Mutagenesis (ASM) where systematically each
                      interface residue is mutated to Alanine and the consequent
                      change in binding energy ΔΔGbinding between the wild type
                      and the mutant complex is measured. If (ΔΔGbinding) is
                      larger than a certain threshold, typically 2 kcal/mol, the
                      interface residue is defined as a hot spot or else it is
                      considered a null spot [MFR07; CW89; BT98]. The so-called
                      hot spot residues are often enriched in disease-associated
                      mutations [Ten+09]. These mutations often cause disrupted or
                      erroneous protein interactions, resulting in phenotypic
                      changes that might cause a disease. Moreover, with the
                      discovery of hot spots in protein-protein interfaces, it has
                      become possible to target a broader range of PPIs with small
                      molecule drugs. The identification of hot spots has helped
                      researchers to identify molecules that interact at these
                      sites, thus interfering with PPIs and the downstream
                      pathways they mediate [Pet+16a; Pet+16b; Sco+16]. Therefore,
                      predicting hot spots is crucial to understand the effect of
                      disease-associated mutations on PPIs and for drug discovery
                      [Mur+17]. As mentioned before, experimentally hot spots can
                      be found out by using ASM, but it is quite costly and
                      tedious and this has led to the use of computational methods
                      to predict hot spot residues. Previous computational
                      approaches included molecular dynamics and knowledge-based
                      methods [GNS02; KB02; MK99; HMK02; GF08; Bre+09]. However,
                      such approaches were time-consuming and hence limited in the
                      number of hot spots predicted. This led to an increased use
                      of machine learning (ML) based methods for hot spot
                      prediction in recent years [DPM07; Den+13; CKL09a; CKL09b;
                      Ass+10]. Such ML approaches capitalize on the availability
                      of experimental datasets containing protein-protein complex
                      structures and ASM-derived hotspot data. However, as it
                      often happens with biological data repositories, such
                      hotspot datasets often contain noise [Mor+17; KC21]. If
                      machine learning (ML) algorithms are trained and predictions
                      are made on this "noisy" data, the results will not be
                      accurate [GG19]. The earlier ML-based approaches for hot
                      spot prediction did not take this issue into account. In
                      this thesis, I describe the basic concepts and recent
                      advances of machine learning applications in finding the
                      protein–protein interaction hot spots. To reduce the
                      effects of noise in hot spot prediction, I have proposed the
                      method RBHS (Robust Principal Component Analysis-(RPCA)
                      based Prediction of Protein-Protein Interaction Hot Spots)
                      in this thesis [Sit+21]. I use RPCA [Can+11] followed by
                      feature selection using Extreme Gradient Boosting (XGBoost)
                      [CG16] on the data matrix containing protein sequence and
                      structure-based features calculated on the interface
                      residues. I trained several popular machine learning
                      classifiers on the benchmark dataset HB-34 [LLD18] and
                      evaluated the performance of my proposed method on the
                      independent test set BID-18 [LLD18]. After extensive
                      computational experimentation and comparison with the
                      existing state-of-the-art approaches to predict hot spots, I
                      was able to show that my method is quite efficient in
                      identifying hot spot residues crucial for protein-protein
                      interactions. Finally, I discuss the challenges and future
                      directions in the prediction of hot spots in this thesis.},
      cin          = {164620 / 160000},
      ddc          = {570},
      cid          = {$I:(DE-82)164620_20181217$ / $I:(DE-82)160000_20140620$},
      typ          = {PUB:(DE-HGF)11},
      doi          = {10.18154/RWTH-2024-06740},
      url          = {https://publications.rwth-aachen.de/record/989346},
}
h1

h2

h3

h4

h5

h6

RWTH

Kontakt

RWTH Publications

Allgemeines