h1

h2

h3

h4

h5
h6
http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png

Transfer learning workflow for I/O bandwidth prediction = Transfer Learning Workflow zur I/O-Leistungsvorhersage der Bandbreite



VerantwortlichkeitsangabeDmytro Povaliaiev

ImpressumAachen : RWTH Aachen University 2023

Umfang1 Online-Ressource : Illustrationen, Diagramme


Masterarbeit, RWTH Aachen University, 2023

Veröffentlicht auf dem Publikationsserver der RWTH Aachen University


Genehmigende Fakultät
Fak01

Hauptberichter/Gutachter
; ;

Tag der mündlichen Prüfung/Habilitation
2023-03-13

Online
DOI: 10.18154/RWTH-2023-05063
URL: https://publications.rwth-aachen.de/record/958007/files/958007.pdf

Einrichtungen

  1. Lehrstuhl für Informatik 12 (Hochleistungsrechnen) (123010)
  2. Fachgruppe Informatik (120000)

Inhaltliche Beschreibung (Schlagwörter)
HPC I/O (frei) ; I/O bandwidth prediction (frei) ; explainable AI (frei) ; interpretable machine learning (frei) ; machine learning (frei) ; transfer learning (frei) ; HPC (frei) ; high-performance computing (frei) ; I/O (frei)

Thematische Einordnung (Klassifikation)
DDC: 004

Kurzfassung
As the new generation of high-performance computing (HPC) systems reaches exascale performance for the first time, preventing underutilization due to I/O bottlenecks becomes even more critical. However, accurately predicting the I/O performance remains a challenging problem. The existing approaches [29] [37] [92] use a significant amount of data from a particular HPC cluster to create a suitable machine learning model. This is problematic due to the required timescale and I/O instrumentation infrastructure, especially in the case of the new filesystems that have not yet gained widespread adoption. To address this issue, I propose a transfer learning-based workflow for I/O bandwidth prediction that requires less data from the target cluster than the existing methods to produce a model of equivalent quality. As a proof-of-concept (POC), I use it to predict the I/O performance of CLAIX, the supercomputing cluster at RWTH Aachen University, employing data collected at the Blue Waters system of the University of Illinois for the initial training. Even in the POC form, the models produced by the workflow show a slight improvement of 1.08% average residual error over the current state of the art of 10% in bandwidth prediction on HPC clusters [37]. I further verify these results using cross-validation and analyze the models with the help of nine interpretable machine learning (also called explainable AI) techniques to provide insight into the features they consider to be the most important ones.

OpenAccess:
Download fulltext PDF
(additional files)

Dokumenttyp
Master Thesis

Format
online

Sprache
English

Interne Identnummern
RWTH-2023-05063
Datensatz-ID: 958007

Beteiligte Länder
Germany

 GO


OpenAccess

QR Code for this record

The record appears in these collections:
Document types > Theses > Master Theses
Publication server / Open Access
Faculty of Computer Science (Fac.9)
Public records
Publications database
120000
123010

 Record created 2023-05-08, last modified 2025-10-20


OpenAccess:
Download fulltext PDF
(additional files)
Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)