Transformation of linearized computational graphs driven by the associativity of the chain rule

Mosenkis, Viktor; Steihaug, Trond; Naumann, Uwe

doi:10.18154/RWTH-2017-10276

Transformation of linearized computational graphs driven by the associativity of the chain rule

Mosenkis, Viktor^RWTH*

2017 & 2018

Verantwortlichkeitsangabevorgelegt von Diplom-Mathematiker Viktor Mosenkis

ImpressumAachen 2017

Umfang1 Online-Ressource (180 Seiten) : Diagramme

Dissertation, RWTH Aachen University, 2017

Veröffentlicht auf dem Publikationsserver der RWTH Aachen University 2018

Genehmigende Fakultät
Fak01

Hauptberichter/Gutachter
Naumann, Uwe (Thesis advisor)^RWTH* ; Steihaug, Trond (Thesis advisor)

Tag der mündlichen Prüfung/Habilitation
2017-10-16

Online
DOI: 10.18154/RWTH-2017-10276
URL: https://publications.rwth-aachen.de/record/709945/files/709945.pdf

Einrichtungen

Inhaltliche Beschreibung (Schlagwörter)
Algorithmic Differentiation (frei) ; branch and bound (frei) ; edge elimination (frei) ; lower bounds (frei) ; minimal edge count problem (frei) ; optimal Jacobian accumulation (frei) ; vertex eliminaton (frei)

Thematische Einordnung (Klassifikation)
DDC: 004

Kurzfassung
Eﬃziente Berechnung von Ableitungen mathematischer Funktionen, die durch Implementierung in einer Programmiersprache gegeben sind, wird immer wichtiger in verschiedenen Bereichen von Industrie und Forschung. Nichtdestotrotz bleibt die eﬃziente und zuverlässige Berechnung von Ableitungen eine große Herausforderung. In der Vergangenheit wurden oft ﬁnite Diﬀerenzen zur Approximation von Sensitivitätsinformation verwendet. Dieses Verfahren hat jedoch zwei entscheidende Nachteile. Das erste Problem ist, dass in vielen Fällen die Annäherung der Tangente durch die Sekante nicht genau genug ist. Das zweite Problem ist, dass die Berechnungskosten für die Berechnung der dichtbesetzten Jacobi Matrix proportional zu der Anzahl der Eingaben dieser Funktion ist. Um beiden Problemen zu begegnen, wird deswegen Für die Berechnung der Sensitivitätsinformation wird daher immer öfter Algorithmisches (oder Automatisches) Diﬀerenzieren (AD) verwendet. AD ist eine Sammlung von Techniken, die eine exakte Berechnung von Ableitungsinformation der Funktion F, die durch ein Computer-Programm geben ist, erlauben. Die verschieden AD Techniken werden oft durch Eliminations-Methoden auf dem linearisierten gerichteten azyklischem Graphen (linearized directed acyclic graphs=l-DAGs) beschrieben. L-DAGs sind Berechnungsgraphen, bei denen die Kantengewichte den lokalen partiellen Ableitungen der zugehörigen Operationen entsprechen. Die am meisten verwendeten Eliminations-Methoden sind Kanten- und Knotenelimination. Für die Berechnung der Jacobi Matrix mit Hilfe der Knoten- oder Kanteneliminationen müssen sämtliche Zwischenknoten bzw. -kanten eliminiert werden, so dass ein bipartiter Graph entsteht. Beide Techniken basieren auf der lokalen Ausnutzung der Kettenregel. Die Knotenelimination kann durch eine Folge von Kanteneliminationen ausgedrückt werden, und ist somit ein Spezialfall der Kantenelimination. Assoziativität der Kettenregel erlaubt die Ausführung der Kanten- und Knotenelimination in verschiedenen Rheinfolgen. Als Kanten- bzw. Knoteneliminationsproblem (vertex elimination = VE bzw. edge elimination=EE) versteht man also die Suche nach einer Sequenz von Kanten- bzw. Knoteneliminationen, bei der die Anzahl der durchgeführten Multiplikationen minimal ist. Will man nur Jakobi-Matrix-Vektor-Produkte bestimmen, und nicht die gesamte Jakobi-Matrix, so sieht man sich mit dem Problem der minimalen Kantenanzahl konfrontiert (minimum edge count=MEC). Bei dem MEC Problem versucht man, eine Sequenz von Kanten- bzw. Knoteneliminationen zu ﬁnden, so dass die Anzahl der Kanten im resultierenden Graphen minimal ist. Es wird vermutet, dass alle oben genannten Probleme NP-Schwer sind. In der Praxis verwendet man daher Greedy-ähnliche Algorithmen, um schnell eine ”fast” optimale L¨osung zu ﬁnden. Branch-and-Bound Methoden werden verwendet, falls eine exakte L¨osung gebraucht wird. Das Ziel dieser Dissertation ist es, die Weiterentwicklung dieser Algorithmen voranzutreiben. Es wird daher zuerst versucht, die Struktur des Problems besser zu verstehen. Dafür werden einige Eigenschaften der l-DAGs und der Eliminationsmethoden bewiesen. Diese Eigenschaften bilden die Basis alle weiteren Ergebnisse, zusätzlich erlauben sie den Suchraum für die Branchand-Bound Algorithmen signiﬁkant zu verkleinern. Ein sehr großer Teil dieser Arbeit widmet sich der Entwicklung eines neuen Algorithmus für die Berechnung der unteren Schranken für die oben genannten Probleme. Es wird gezeigt, dass der neue Ansatz in den meisten Fällen bessere untere Schranken liefert, als die bisher existierenden Algorithmen. An dieser Stelle sei angemerkt, dass es die bisher einzigen Schranken für das MEC Problem sind. Ein weiteres Ergebnis dieser Dissertation sind Optimalität erhaltende Eliminationen. Bei Branch-and-Bound Algorithmen erlauben diese eine deutliche Reduzierung des Suchraums, auch die Greedy-ähnliche Algorithmen lassen sich dadurch verbessern. Ebenso lassen sich durch die Optimalität erhaltende Eliminationen Kantenelimination besser einsetzen. Aufgrund des deutlich größeren Suchraums, ﬁndet Kantenelimination bisher kaum Verwendung in der Praxis. In dieser Arbeit wird gezeigt, dass Verwendung von Kanteneliminationen dennoch sinnvoll ist: durch das Mischen von Kanten- und Knoteneliminationen können einerseits bessere Ergebnisse erzielt werden, d.h. berechnete Lösung benötigt weniger Multiplikationen, andererseits lässt sich so die Lösung schneller ﬁnden.

The eﬃcient computation of derivatives of mathematical functions which are implemented as computer programs has become more important in various branches of industry and academia. While derivative information is highly desired, its eﬃcient and reliable computation remains a big challenge. In the past ﬁnite diﬀerences was a very popular approach to approximate derivatives This approach fails in more and more real world problems for two reasons. The ﬁrst one is that for many functions approximating tangents by secants does not provide satisfactory results. The second problem of using ﬁnite diﬀerences is that the computational costs for computing the full dense Jacobian grows linearly with the number of inputs of the function. Therefore Algorithmic (or Automatic) Diﬀerentiation (AD) becomes the method of choice for computing derivatives as it promises to address both problems. AD is a collection of techniques allowing computation of accurate derivative information of a function F : Rn →Rm that is implemented as a computer program in some high level programming language. Often the algorithms for AD are expressed as elimination methods on the linearized directed acyclic graphs or l-DAGs. L-DAGs are computational graphs, where the edge weights are equal to the local partial derivatives of the respective operations. The most popular elimination methods are edge and vertex elimination. To compute the Jacobian with the help of edge or vertex eliminations one need to transform the l-DAG into a bipartite graph. Both techniques are based on local exploitation of the chain rule. Vertex elimination is a special case of edge elimination as it can be expressed by a sequence of edge eliminations. Associativity of the chain rule allows to perform the edge and vertex eliminations in diﬀerent orders. The problem of ﬁnding an order of vertex [edge] eliminations that minimizes the number of multiplications involved in the computation of the Jacobian is known as vertex [edge] elimination problem (VE [EE]). Another closely related problem is the minimum edge count problem (MEC). This problem arises when Jacobian-vector products are needed and computation of full Jacobian is not required. The MEC problem aims to ﬁnd a sequence of vertex (edge) eliminations such that the resulting l-DAG has the smallest amount of edges. VE, EE and MEC problems are conjectured to be NP-hard. Therefore in practice the solution of these problems is approximated by greedy like heuristics or computed by branch and bound based methods if the exact solution is of interest. The main aim of this thesis is to support further development of algorithms for solving these problems. Therefore this thesis tries to provide a better understanding of the problem structure by proving several properties of the l-DAGs. These properties help to reduce the search space and establish the base for other outcomes of the thesis. Building blocks for new algorithms are provided by introducing a new approach for computing lower bounds and proving various optimality preserving eliminations for all three problems. It is shown that the lower bounds computed according to this new approach are in most cases superior to the existing lower bounds. For the MEC problem this is the ﬁrst algorithm for computing lower bounds so far. Better lower bounds allow to perform earlier cuts in the branch and bound type of algorithms and also help to understand the quality of the given solution. Optimality preserving eliminations is another useful technique for branch and bound and greedy like algorithms. They allow the search space for the branch and bound algorithm to be reduced and can also be used to improve the greedy heuristics. The underlying search space of the edge elimination is signiﬁcantly bigger compared to that of vertex elimination, therefore edge elimination is almost never used in applications. In this thesis several optimality preserving elimination are proved. The optimality preserving eliminations for edge eliminations are generalized versions of the results for vertex eliminations. Thus mixing of edge and vertex elimination in algorithms is suggested to improve the results for approximative algorithm and to reduce the corresponding search space if the exact solution is required, demonstrating that edge eliminations should not be neglected in praxis.

OpenAccess:
PDF
(additional files)