Analyzing memory accesses for performance and correctness of parallel programs

Cramer, Tim; Müller, Matthias S.; Katoen, Joost-Pieter

doi:36291

Items
Marc 21

001			695879
005			20251020082400.0
024	7	_	\|2 HBZ \|a HT019387391
024	7	_	\|2 datacite_doi \|a 10.18154/RWTH-2017-06527
024	7	_	\|2 Laufende Nummer \|a 36291
037	_	_	\|a RWTH-2017-06527
041	_	_	\|a English
082	_	_	\|a 004
100	1	_	\|0 P:(DE-82)IDM00436 \|a Cramer, Tim \|b 0 \|u rwth
245	_	_	\|a Analyzing memory accesses for performance and correctness of parallel programs \|c vorgelegt von Diplom-Ingenieur Tim Cramer \|h online
246	_	3	\|a Analyse von Speicherzugriffen für die Performance und Korrektheit von parallelen Programmen \|y German
260	_	_	\|a Aachen \|c 2017
300	_	_	\|a 1 Online-Ressource (viii, 151 Seiten) : Illustrationen, Diagramme
336	7	_	\|2 DataCite \|a Output Types/Dissertation
336	7	_	\|2 ORCID \|a DISSERTATION
336	7	_	\|2 BibTeX \|a PHDTHESIS
336	7	_	\|0 2 \|2 EndNote \|a Thesis
336	7	_	\|0 PUB:(DE-HGF)11 \|2 PUB:(DE-HGF) \|a Dissertation / PhD Thesis \|b phd \|m phd
336	7	_	\|2 DRIVER \|a doctoralThesis
500	_	_	\|a Veröffentlicht auf dem Publikationsserver der RWTH Aachen University
502	_	_	\|a Dissertation, RWTH Aachen University, 2017 \|b Dissertation \|c RWTH Aachen University \|d 2017 \|g Fak01 \|o 2017-07-05
520	3	_	\|a Der stetig wachsende Bedarf an Rechenleistung im wissenschaftlichen Umfeld hat im laufenden Jahrzehnt sowohl zu einer weiten Verbreitung als auch hohen Akzeptanz von hochparallelen Computerarchitekturen geführt. Dieser Trend ist auch in der TOP500-Liste der leistungsfähigsten Supercomputer der Welt manifestiert, in welcher über 40% der Gesamt-Performance aus Akzelerator-basierten Systemen resultiert. Die Programmierung dieser Systeme erforderte in der Vergangenheit häufig zeitaufwändige Anpassungen der rechenintensiven Programmteile, bevor produktivere Ansätze wie OpenACC oder die die Offloading-Direktiven in OpenMP aufkamen. Jedoch bleibt auch mit diesen nutzerfreundlicheren Ansätzen die Programmierung für heterogene Architekturen komplex und fehleranfällig und stellt viele Anforderungen an den Programmierer, der eine hohe Performance für seine Anwendung erreichen will. Eine Schlüsselrolle für das Verständnis der Performance und der Korrektheit eines parallelen Programms spiegelt sich in der Analyse der Speicherzugriffe wieder. Diese Arbeit verfolgt einen ganzheitlichen Ansatz unter Berücksichtigung der Hardware-Eigenschaften, des Programmierparadigmas, der zugrundeliegende Implementierung und der Schnittstelle für eine adäquate Tool-Unterstützung in Bezug auf beide Aspekte. Die Verbesserung der Performance und die Validierung einer Anwendung erfordert hierbei ein tiefgehendes Verständnis des dynamischen Laufzeitverhaltens. Hierbei ist das adäquate Platzieren der Daten und Threads essentiell für die Performance, und die Zugriffsreihenfolge essentiell für das deterministische Verhalten bzw. die Korrektheit einer Anwendung. Aus diesem Grund wird diese Arbeit zunächst eine systematische Methodik zur Bewertung von OpenMP Target-Devices, Muster für die effiziente Task-parallele Programmierung von Non-Uniform Memory Access (NUMA) Architekturen, sowie Verbesserungen für eine standardkonforme Tool-Unterstützung präsentieren. Basierend auf den gewonnenen Erkenntnissen, wird im Anschluss ein OpenMP Epochen-Modell für die Korrektheitsanalyse definiert, welches die Semantik inklusive des Laufzeit- und Speichermodells von OpenMP berücksichtigt. Die Evaluierung der entwickelten Konzepte erfolgt an Hand von relevanten Tools zur Performance- und Korrektheitsanalyse. \|l ger
520	_	_	\|a The demand for large compute capabilities in scientific computing led to wide use and acceptance of highly-parallel computer architectures during the last decade. This trend is manifested in the TOP500, listing the fastest supercomputer of the world, in which about 40% of the performance share results from accelerator-based systems. Programming for these architectures in the past often required a timeconsuming rewrite of the compute-intensive application parts, until more productive approaches like Open Accelerators (OpenACC) or the target offloading features of Open Multi-Processing (OpenMP) came to existence. However, parallel programming for heterogeneous architectures is still a complex and error-prone task, posing several challenges to the programmer who wants to achieve high application performance. One key factor for the understanding of the performance and the correctness of a parallel program is reflected in the analysis of the memory accesses. This work takes a holistic view on the hardware properties, the programming paradigm, its particular implementation and the interfaces for an adequate tool support with respect to both aspects. The improvement of the performance and the validation of an application requires a deep comprehension of the dynamic runtime behavior. Here, the appropriate data and thread placement is essential for the performance, and the order of the memory accesses is essential for the deterministic behavior or rather the correctness of the application. Therefore, this work will first present a systematic methodology for the assessment of OpenMP for target devices, patterns for the efficient usage of task-based programming on Non-Uniform Memory Access (NUMA) architectures, and the improvement of standard-compliant tool support. Based on the gathered insights, an OpenMP epoch model for correctness checking is defined, which respects the OpenMP semantics including the runtime and memory model. The evaluation of the developed concepts is shown by application to realworld performance analysis and correctness checking tools. \|l eng
588	_	_	\|a Dataset connected to Lobid/HBZ
591	_	_	\|a Germany
650	_	7	\|x Diss.
700	1	_	\|0 P:(DE-82)IDM01074 \|a Müller, Matthias S. \|b 1 \|e Thesis advisor \|u rwth
700	1	_	\|0 P:(DE-82)IDM00048 \|a Katoen, Joost-Pieter \|b 2 \|e Thesis advisor \|u rwth
856	4	_	\|u https://publications.rwth-aachen.de/record/695879/files/695879.pdf \|y OpenAccess
856	4	_	\|u https://publications.rwth-aachen.de/record/695879/files/695879_source.zip \|y Restricted
856	4	_	\|u https://publications.rwth-aachen.de/record/695879/files/695879.gif?subformat=icon \|x icon \|y OpenAccess
856	4	_	\|u https://publications.rwth-aachen.de/record/695879/files/695879.jpg?subformat=icon-1440 \|x icon-1440 \|y OpenAccess
856	4	_	\|u https://publications.rwth-aachen.de/record/695879/files/695879.jpg?subformat=icon-180 \|x icon-180 \|y OpenAccess
856	4	_	\|u https://publications.rwth-aachen.de/record/695879/files/695879.jpg?subformat=icon-640 \|x icon-640 \|y OpenAccess
856	4	_	\|u https://publications.rwth-aachen.de/record/695879/files/695879.jpg?subformat=icon-700 \|x icon-700 \|y OpenAccess
856	4	_	\|u https://publications.rwth-aachen.de/record/695879/files/695879.pdf?subformat=pdfa \|x pdfa \|y OpenAccess
909	C	O	\|o oai:publications.rwth-aachen.de:695879 \|p openaire \|p open_access \|p VDB \|p driver \|p dnbdelivery
910	1	_	\|0 I:(DE-588b)36225-6 \|6 P:(DE-82)IDM01074 \|a RWTH Aachen \|b 1 \|k RWTH
910	1	_	\|0 I:(DE-588b)36225-6 \|6 P:(DE-82)IDM00048 \|a RWTH Aachen \|b 2 \|k RWTH
914	1	_	\|y 2017
915	_	_	\|0 StatID:(DE-HGF)0510 \|2 StatID \|a OpenAccess
920	1	_	\|0 I:(DE-82)123010_20140620 \|k 123010 \|l Lehrstuhl für Informatik 12 (Hochleistungsrechnen) \|x 0
920	1	_	\|0 I:(DE-82)120000_20140620 \|k 120000 \|l Fachgruppe Informatik \|x 1
980	1	_	\|a FullTexts
980	_	_	\|a phd
980	_	_	\|a VDB
980	_	_	\|a UNRESTRICTED
980	_	_	\|a I:(DE-82)123010_20140620
980	_	_	\|a I:(DE-82)120000_20140620

Library	Collection	CLSMajor	CLSMinor	Language	Author

Marc 21

h1

h2

h3

h4

h5

h6