2023
Masterarbeit, RWTH Aachen University, 2023
Veröffentlicht auf dem Publikationsserver der RWTH Aachen University
Genehmigende Fakultät
Fak01
Hauptberichter/Gutachter
; ;
Tag der mündlichen Prüfung/Habilitation
2023-03-13
Online
DOI: 10.18154/RWTH-2023-05063
URL: https://publications.rwth-aachen.de/record/958007/files/958007.pdf
Einrichtungen
Inhaltliche Beschreibung (Schlagwörter)
HPC I/O (frei) ; I/O bandwidth prediction (frei) ; explainable AI (frei) ; interpretable machine learning (frei) ; machine learning (frei) ; transfer learning (frei) ; HPC (frei) ; high-performance computing (frei) ; I/O (frei)
Thematische Einordnung (Klassifikation)
DDC: 004
Kurzfassung
As the new generation of high-performance computing (HPC) systems reaches exascale performance for the first time, preventing underutilization due to I/O bottlenecks becomes even more critical. However, accurately predicting the I/O performance remains a challenging problem. The existing approaches [29] [37] [92] use a significant amount of data from a particular HPC cluster to create a suitable machine learning model. This is problematic due to the required timescale and I/O instrumentation infrastructure, especially in the case of the new filesystems that have not yet gained widespread adoption. To address this issue, I propose a transfer learning-based workflow for I/O bandwidth prediction that requires less data from the target cluster than the existing methods to produce a model of equivalent quality. As a proof-of-concept (POC), I use it to predict the I/O performance of CLAIX, the supercomputing cluster at RWTH Aachen University, employing data collected at the Blue Waters system of the University of Illinois for the initial training. Even in the POC form, the models produced by the workflow show a slight improvement of 1.08% average residual error over the current state of the art of 10% in bandwidth prediction on HPC clusters [37]. I further verify these results using cross-validation and analyze the models with the help of nine interpretable machine learning (also called explainable AI) techniques to provide insight into the features they consider to be the most important ones.
OpenAccess:
PDF
(additional files)
Dokumenttyp
Master Thesis
Format
online
Sprache
English
Interne Identnummern
RWTH-2023-05063
Datensatz-ID: 958007
Beteiligte Länder
Germany
|
The record appears in these collections: |