High-performance tensor operations : tensor transpositions, spin summations, and tensor contractions

Springer, Paul; Bientinesi, Paolo; Wellein, Gerhard
doi:HT019974024
000755345 001__ 755345
000755345 005__ 20230408005846.0
000755345 0247_ $$2HBZ$$aHT019974024
000755345 0247_ $$2Laufende Nummer$$a37919
000755345 0247_ $$2datacite_doi$$a10.18154/RWTH-2019-01778
000755345 037__ $$aRWTH-2019-01778
000755345 041__ $$aEnglish
000755345 082__ $$a004
000755345 1001_ $$0P:(DE-588)1178542319$$aSpringer, Paul$$b0$$urwth
000755345 245__ $$aHigh-performance tensor operations : tensor transpositions, spin summations, and tensor contractions$$cvorgelegt von Paul Springer, Master of Science$$honline
000755345 260__ $$aAachen$$c2019
000755345 300__ $$a1 Online-Ressource (xiii, 169 Seiten) : Illustrationen
000755345 3367_ $$02$$2EndNote$$aThesis
000755345 3367_ $$0PUB:(DE-HGF)11$$2PUB:(DE-HGF)$$aDissertation / PhD Thesis$$bphd$$mphd
000755345 3367_ $$2BibTeX$$aPHDTHESIS
000755345 3367_ $$2DRIVER$$adoctoralThesis
000755345 3367_ $$2DataCite$$aOutput Types/Dissertation
000755345 3367_ $$2ORCID$$aDISSERTATION
000755345 500__ $$aVeröffentlicht auf dem Publikationsserver der RWTH Aachen University
000755345 502__ $$aDissertation, RWTH Aachen University, 2019$$bDissertation$$cRWTH Aachen University$$d2019$$gFak01$$o2019-01-07
000755345 5203_ $$aDiese Dissertation befasst sich mit der Entwicklung von neuartigen, hoch- performanten Algorithmen zur Ausführung von Tensor-Transpositionen, Spin-Summationen sowie Tensor-Kontraktionen. Eine zentrale Herausforderung, die diesen Operationen zugrunde liegt ist das komplexe Muster der Speicherzugriffe, welches aus der mehrdimensionalen Natur der Tensoren hervorgerufen wird; des Weiteren führen diese komplexen Speicherzugriffsmuster oftmals zu einer geringen Ausnutzung der CPU-eigenen Cachehierarchie und somit zu einer geringen Performanz. Um diese Ineffizienzen zu überkommen, werfen die entwickelten Algorithmen in dieser Dissertation einen speziellen Fokus auf die Ausnutzung der räumlichen sowie temporären Lokalität; dies führt zu strukturierten und vorteilhaften Speicherzugriffen und somit zu einer hohen Performanz. Da Tensor-Transpositionen, Spin-Summationen, und Tensor-Kontraktionen die haupt Performanz-Engpässe in vielen wissenschaftlichen Anwendungen darstellen, ist es das Ziel dieser Dissertation signifikante Beschleunigungen gegenüber hochmodernen Softwarelösungen für solche Operationen zu erzielen. Wir beschreiben einen Ansatz zu Tensor-Transpositionen, welcher nahezu die maximale Speicherbandbreite auf verschiedenen Rechnerarchitekturen erzielt. Des Weiteren präsentieren wir mehrere Algorithmen für Spin-Summationen aus dem Blickwinkel des hochperformanten Rechnens, welche sowohl die räumliche als auch die temporäre Lokalität der Spin-Summation ausnutzen. Darüber hinaus stellen wir eine neuartige GEMM-ähnliche Methodik für Tensor- Kontraktionen vor. Dieser Ansatz vermeidet die Nachteile vorheriger Verfahren—allem voran übermäßige Speicherzugriffe sowie ein erhöhter Speicherbedarf—und ist damit in der Lage, die Performanz-Kluft zwischen Tensor-Kontraktionen und hoch-performanten Matrix-Matrix Multiplikationen zu schließen.$$lger
000755345 520__ $$aThis dissertation is concerned with the development of novel high-performance algorithms for tensor transpositions, spin summations, and tensor contractions. A central challenge that is common to these operations is the complex memory access pattern, which is due to the multidimensional nature of tensors, and which often leads to a poor utilization of the CPU’s rich cache hierarchy and consequently to low performance. To overcome this inefficiency, the algorithms presented in this dissertation pay special attention to the exploitation of spatial as well as temporal locality, resulting in a preferable memory access pattern, and thus high performance. With tensor transpositions, spin summations, and tensor contractions being the major performance bottlenecks in many scientific applications, the goal of this dissertation is to provide significant speedups over other state-of-the-art software solutions for such operations. We describe an approach to tensor transpositions that is able to attain close-to-peak memory bandwidth across multiple architectures. We also present a high-performance perspective on spin summations and propose an algorithm that exploits both the spatial as well as temporal locality inherent to the problem. Finally, a novel GEMM-like methodology for tensor contractions is introduced; this approach avoids the drawbacks of previous approaches—namely excess memory accesses or an increased memory footprint—and is able to close the performance gap between tensor contractions and high-performance matrix-matrix multiplications.$$leng
000755345 588__ $$aDataset connected to Lobid/HBZ
000755345 591__ $$aGermany
000755345 653_7 $$aHPC
000755345 653_7 $$adense linear algebra
000755345 653_7 $$atensor
000755345 7001_ $$0P:(DE-82)IDM00518$$aBientinesi, Paolo$$b1$$eThesis advisor$$urwth
000755345 7001_ $$aWellein, Gerhard$$b2$$eThesis advisor
000755345 8564_ $$uhttps://publications.rwth-aachen.de/record/755345/files/755345.pdf$$yOpenAccess
000755345 8564_ $$uhttps://publications.rwth-aachen.de/record/755345/files/755345_source.tar.gz$$yRestricted
000755345 8564_ $$uhttps://publications.rwth-aachen.de/record/755345/files/755345.gif?subformat=icon$$xicon$$yOpenAccess
000755345 8564_ $$uhttps://publications.rwth-aachen.de/record/755345/files/755345.jpg?subformat=icon-1440$$xicon-1440$$yOpenAccess
000755345 8564_ $$uhttps://publications.rwth-aachen.de/record/755345/files/755345.jpg?subformat=icon-180$$xicon-180$$yOpenAccess
000755345 8564_ $$uhttps://publications.rwth-aachen.de/record/755345/files/755345.jpg?subformat=icon-640$$xicon-640$$yOpenAccess
000755345 8564_ $$uhttps://publications.rwth-aachen.de/record/755345/files/755345.jpg?subformat=icon-700$$xicon-700$$yOpenAccess
000755345 909CO $$ooai:publications.rwth-aachen.de:755345$$popenaire$$popen_access$$pVDB$$pdriver$$pdnbdelivery
000755345 9101_ $$0I:(DE-588b)36225-6$$6P:(DE-588)1178542319$$aRWTH Aachen$$b0$$kRWTH
000755345 9101_ $$0I:(DE-588b)36225-6$$6P:(DE-82)IDM00518$$aRWTH Aachen$$b1$$kRWTH
000755345 9141_ $$y2019
000755345 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
000755345 9201_ $$0I:(DE-82)123620_20140620$$k123620$$lLehr- und Forschungsgebiet für Algorithmen-Orientierte Code-Generierung für Hochleistungsrechnerarchitekturen$$x0
000755345 9201_ $$0I:(DE-82)120000_20140620$$k120000$$lFachgruppe Informatik$$x1
000755345 9201_ $$0I:(DE-82)080003_20140620$$k080003$$lAachen Institute for Advanced Study in Computational Engineering Science (AICES)$$x2
000755345 961__ $$c2019-03-01T17:01:39.075009$$x2019-02-16T07:42:23.993442$$z2019-03-01T17:01:39.075009
000755345 9801_ $$aFullTexts
000755345 980__ $$aI:(DE-82)080003_20140620
000755345 980__ $$aI:(DE-82)120000_20140620
000755345 980__ $$aI:(DE-82)123620_20140620
000755345 980__ $$aUNRESTRICTED
000755345 980__ $$aVDB
000755345 980__ $$aphd
h1

h2

h3

h4

h5

h6

RWTH

Kontakt

RWTH Publications

Allgemeines