Transformation of linearized computational graphs driven by the associativity of the chain rule

Mosenkis, Viktor; Steihaug, Trond; Naumann, Uwe

doi:10.18154/RWTH-2017-10276

000709945 001__ 709945
000709945 005__ 20230408005423.0
000709945 0247_ $$2HBZ$$aHT019534673
000709945 0247_ $$2Laufende Nummer$$a36814
000709945 0247_ $$2datacite_doi$$a10.18154/RWTH-2017-10276
000709945 037__ $$aRWTH-2017-10276
000709945 041__ $$aEnglish
000709945 082__ $$a004
000709945 1001_ $$0P:(DE-588)114929132X$$aMosenkis, Viktor$$b0$$urwth
000709945 245__ $$aTransformation of linearized computational graphs driven by the associativity of the chain rule$$cvorgelegt von Diplom-Mathematiker Viktor Mosenkis$$honline
000709945 260__ $$aAachen$$c2017
000709945 260__ $$c2018
000709945 300__ $$a1 Online-Ressource (180 Seiten) : Diagramme
000709945 3367_ $$02$$2EndNote$$aThesis
000709945 3367_ $$0PUB:(DE-HGF)11$$2PUB:(DE-HGF)$$aDissertation / PhD Thesis$$bphd$$mphd
000709945 3367_ $$2BibTeX$$aPHDTHESIS
000709945 3367_ $$2DRIVER$$adoctoralThesis
000709945 3367_ $$2DataCite$$aOutput Types/Dissertation
000709945 3367_ $$2ORCID$$aDISSERTATION
000709945 502__ $$aDissertation, RWTH Aachen University, 2017$$bDissertation$$cRWTH Aachen University$$d2017$$gFak01$$o2017-10-16
000709945 500__ $$aVeröffentlicht auf dem Publikationsserver der RWTH Aachen University 2018
000709945 5203_ $$aEﬃziente Berechnung von Ableitungen mathematischer Funktionen, die durch Implementierung in einer Programmiersprache gegeben sind, wird immer wichtiger in verschiedenen Bereichen von Industrie und Forschung. Nichtdestotrotz bleibt die eﬃziente und zuverlässige Berechnung von Ableitungen eine große Herausforderung. In der Vergangenheit wurden oft ﬁnite Diﬀerenzen zur Approximation von Sensitivitätsinformation verwendet. Dieses Verfahren hat jedoch zwei entscheidende Nachteile. Das erste Problem ist, dass in vielen Fällen die Annäherung der Tangente durch die Sekante nicht genau genug ist. Das zweite Problem ist, dass die Berechnungskosten für die Berechnung der dichtbesetzten Jacobi Matrix proportional zu der Anzahl der Eingaben dieser Funktion ist. Um beiden Problemen zu begegnen, wird deswegen Für die Berechnung der Sensitivitätsinformation wird daher immer öfter Algorithmisches (oder Automatisches) Diﬀerenzieren (AD) verwendet. AD ist eine Sammlung von Techniken, die eine exakte Berechnung von Ableitungsinformation der Funktion F, die durch ein Computer-Programm geben ist, erlauben. Die verschieden AD Techniken werden oft durch Eliminations-Methoden auf dem linearisierten gerichteten azyklischem Graphen (linearized directed acyclic graphs=l-DAGs) beschrieben. L-DAGs sind Berechnungsgraphen, bei denen die Kantengewichte den lokalen partiellen Ableitungen der zugehörigen Operationen entsprechen. Die am meisten verwendeten Eliminations-Methoden sind Kanten- und Knotenelimination. Für die Berechnung der Jacobi Matrix mit Hilfe der Knoten- oder Kanteneliminationen müssen sämtliche Zwischenknoten bzw. -kanten eliminiert werden, so dass ein bipartiter Graph entsteht. Beide Techniken basieren auf der lokalen Ausnutzung der Kettenregel. Die Knotenelimination kann durch eine Folge von Kanteneliminationen ausgedrückt werden, und ist somit ein Spezialfall der Kantenelimination. Assoziativität der Kettenregel erlaubt die Ausführung der Kanten- und Knotenelimination in verschiedenen Rheinfolgen. Als Kanten- bzw. Knoteneliminationsproblem (vertex elimination = VE bzw. edge elimination=EE) versteht man also die Suche nach einer Sequenz von Kanten- bzw. Knoteneliminationen, bei der die Anzahl der durchgeführten Multiplikationen minimal ist. Will man nur Jakobi-Matrix-Vektor-Produkte bestimmen, und nicht die gesamte Jakobi-Matrix, so sieht man sich mit dem Problem der minimalen Kantenanzahl konfrontiert (minimum edge count=MEC). Bei dem MEC Problem versucht man, eine Sequenz von Kanten- bzw. Knoteneliminationen zu ﬁnden, so dass die Anzahl der Kanten im resultierenden Graphen minimal ist. Es wird vermutet, dass alle oben genannten Probleme NP-Schwer sind. In der Praxis verwendet man daher Greedy-ähnliche Algorithmen, um schnell eine ”fast” optimale L¨osung zu ﬁnden. Branch-and-Bound Methoden werden verwendet, falls eine exakte L¨osung gebraucht wird. Das Ziel dieser Dissertation ist es, die Weiterentwicklung dieser Algorithmen voranzutreiben. Es wird daher zuerst versucht, die Struktur des Problems besser zu verstehen. Dafür werden einige Eigenschaften der l-DAGs und der Eliminationsmethoden bewiesen. Diese Eigenschaften bilden die Basis alle weiteren Ergebnisse, zusätzlich erlauben sie den Suchraum für die Branchand-Bound Algorithmen signiﬁkant zu verkleinern. Ein sehr großer Teil dieser Arbeit widmet sich der Entwicklung eines neuen Algorithmus für die Berechnung der unteren Schranken für die oben genannten Probleme. Es wird gezeigt, dass der neue Ansatz in den meisten Fällen bessere untere Schranken liefert, als die bisher existierenden Algorithmen. An dieser Stelle sei angemerkt, dass es die bisher einzigen Schranken für das MEC Problem sind. Ein weiteres Ergebnis dieser Dissertation sind Optimalität erhaltende Eliminationen. Bei Branch-and-Bound Algorithmen erlauben diese eine deutliche Reduzierung des Suchraums, auch die Greedy-ähnliche Algorithmen lassen sich dadurch verbessern. Ebenso lassen sich durch die Optimalität erhaltende Eliminationen Kantenelimination besser einsetzen. Aufgrund des deutlich größeren Suchraums, ﬁndet Kantenelimination bisher kaum Verwendung in der Praxis. In dieser Arbeit wird gezeigt, dass Verwendung von Kanteneliminationen dennoch sinnvoll ist: durch das Mischen von Kanten- und Knoteneliminationen können einerseits bessere Ergebnisse erzielt werden, d.h. berechnete Lösung benötigt weniger Multiplikationen, andererseits lässt sich so die Lösung schneller ﬁnden.$$lger
000709945 520__ $$aThe eﬃcient computation of derivatives of mathematical functions which are implemented as computer programs has become more important in various branches of industry and academia. While derivative information is highly desired, its eﬃcient and reliable computation remains a big challenge. In the past ﬁnite diﬀerences was a very popular approach to approximate derivatives This approach fails in more and more real world problems for two reasons. The ﬁrst one is that for many functions approximating tangents by secants does not provide satisfactory results. The second problem of using ﬁnite diﬀerences is that the computational costs for computing the full dense Jacobian grows linearly with the number of inputs of the function. Therefore Algorithmic (or Automatic) Diﬀerentiation (AD) becomes the method of choice for computing derivatives as it promises to address both problems. AD is a collection of techniques allowing computation of accurate derivative information of a function F : Rn →Rm that is implemented as a computer program in some high level programming language. Often the algorithms for AD are expressed as elimination methods on the linearized directed acyclic graphs or l-DAGs. L-DAGs are computational graphs, where the edge weights are equal to the local partial derivatives of the respective operations. The most popular elimination methods are edge and vertex elimination. To compute the Jacobian with the help of edge or vertex eliminations one need to transform the l-DAG into a bipartite graph. Both techniques are based on local exploitation of the chain rule. Vertex elimination is a special case of edge elimination as it can be expressed by a sequence of edge eliminations. Associativity of the chain rule allows to perform the edge and vertex eliminations in diﬀerent orders. The problem of ﬁnding an order of vertex [edge] eliminations that minimizes the number of multiplications involved in the computation of the Jacobian is known as vertex [edge] elimination problem (VE [EE]). Another closely related problem is the minimum edge count problem (MEC). This problem arises when Jacobian-vector products are needed and computation of full Jacobian is not required. The MEC problem aims to ﬁnd a sequence of vertex (edge) eliminations such that the resulting l-DAG has the smallest amount of edges. VE, EE and MEC problems are conjectured to be NP-hard. Therefore in practice the solution of these problems is approximated by greedy like heuristics or computed by branch and bound based methods if the exact solution is of interest. The main aim of this thesis is to support further development of algorithms for solving these problems. Therefore this thesis tries to provide a better understanding of the problem structure by proving several properties of the l-DAGs. These properties help to reduce the search space and establish the base for other outcomes of the thesis. Building blocks for new algorithms are provided by introducing a new approach for computing lower bounds and proving various optimality preserving eliminations for all three problems. It is shown that the lower bounds computed according to this new approach are in most cases superior to the existing lower bounds. For the MEC problem this is the ﬁrst algorithm for computing lower bounds so far. Better lower bounds allow to perform earlier cuts in the branch and bound type of algorithms and also help to understand the quality of the given solution. Optimality preserving eliminations is another useful technique for branch and bound and greedy like algorithms. They allow the search space for the branch and bound algorithm to be reduced and can also be used to improve the greedy heuristics. The underlying search space of the edge elimination is signiﬁcantly bigger compared to that of vertex elimination, therefore edge elimination is almost never used in applications. In this thesis several optimality preserving elimination are proved. The optimality preserving eliminations for edge eliminations are generalized versions of the results for vertex eliminations. Thus mixing of edge and vertex elimination in algorithms is suggested to improve the results for approximative algorithm and to reduce the corresponding search space if the exact solution is required, demonstrating that edge eliminations should not be neglected in praxis.$$leng
000709945 588__ $$aDataset connected to Lobid/HBZ
000709945 591__ $$aGermany
000709945 653_7 $$aAlgorithmic Differentiation
000709945 653_7 $$abranch and bound
000709945 653_7 $$aedge elimination
000709945 653_7 $$alower bounds
000709945 653_7 $$aminimal edge count problem
000709945 653_7 $$aoptimal Jacobian accumulation
000709945 653_7 $$avertex eliminaton
000709945 7001_ $$0P:(DE-82)IDM00960$$aNaumann, Uwe$$b1$$eThesis advisor$$urwth
000709945 7001_ $$aSteihaug, Trond$$b2$$eThesis advisor
000709945 8564_ $$uhttps://publications.rwth-aachen.de/record/709945/files/709945.pdf$$yOpenAccess
000709945 8564_ $$uhttps://publications.rwth-aachen.de/record/709945/files/709945_source.zip$$yRestricted
000709945 8564_ $$uhttps://publications.rwth-aachen.de/record/709945/files/709945.gif?subformat=icon$$xicon$$yOpenAccess
000709945 8564_ $$uhttps://publications.rwth-aachen.de/record/709945/files/709945.jpg?subformat=icon-1440$$xicon-1440$$yOpenAccess
000709945 8564_ $$uhttps://publications.rwth-aachen.de/record/709945/files/709945.jpg?subformat=icon-180$$xicon-180$$yOpenAccess
000709945 8564_ $$uhttps://publications.rwth-aachen.de/record/709945/files/709945.jpg?subformat=icon-640$$xicon-640$$yOpenAccess
000709945 8564_ $$uhttps://publications.rwth-aachen.de/record/709945/files/709945.jpg?subformat=icon-700$$xicon-700$$yOpenAccess
000709945 909CO $$ooai:publications.rwth-aachen.de:709945$$pdnbdelivery$$pdriver$$pVDB$$popen_access$$popenaire
000709945 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
000709945 9141_ $$y2017
000709945 9101_ $$0I:(DE-588b)36225-6$$6P:(DE-588)114929132X$$aRWTH Aachen$$b0$$kRWTH
000709945 9101_ $$0I:(DE-588b)36225-6$$6P:(DE-82)IDM00960$$aRWTH Aachen$$b1$$kRWTH
000709945 9201_ $$0I:(DE-82)120000_20140620$$k120000$$lFachgruppe Informatik$$x1
000709945 9201_ $$0I:(DE-82)123120_20140620$$k123120$$lLehr- und Forschungsgebiet Informatik 12 (Software und Werkzeuge für Computational Engineering)$$x0
000709945 961__ $$c2018-02-14T09:56:00.718648$$x2017-12-04T12:01:29.219584$$z2018-02-14T09:56:00.718648
000709945 9801_ $$aFullTexts
000709945 980__ $$aI:(DE-82)120000_20140620
000709945 980__ $$aI:(DE-82)123120_20140620
000709945 980__ $$aUNRESTRICTED
000709945 980__ $$aVDB
000709945 980__ $$aphd

h1

h2

h3

h4

h5

h6

RWTH

Kontakt

RWTH Publications

Allgemeines