%0 Thesis %A Mayfrank, Daniel Georg %T End-to-end reinforcement learning of koopman model predictive control %V 42 (2025) %I Rheinisch-Westfälische Technische Hochschule Aachen %V Dissertation %C Aachen %M RWTH-2025-09976 %B Aachener Verfahrenstechnik series - AVT.SVT - Systemverfahrenstechnik - Dissertationen %P 1 Online-Ressource : Illustrationen %D 2025 %Z Veröffentlicht auf dem Publikationsserver der RWTH Aachen University 2026 %Z Dissertation, Rheinisch-Westfälische Technische Hochschule Aachen, 2025 %X Model-based control methods such as Model Predictive Control (MPC) and variants thereof, e.g., economic nonlinear MPC (eNMPC), remain indispensable in the chemical industry. However, mechanistic models are often unavailable or too computationally expensive for use in (e)NMPC. Data-driven models, usually trained using system identification (SI) approaches, can serve as a computationally cheap alternative to mechanistic models. However, SI focuses narrowly on maximizing average prediction accuracy, which can result in suboptimal performance when the model is used as part of a policy. In contrast, recent research has explored training data-driven models end-to-end for optimal performance in predictive control policies using reinforcement learning (RL) approaches. This thesis contributes to this emerging research field by developing methods for RL-based end-to-end learning of Koopman models for (e)NMPC policies. Koopman models can accurately represent the dynamics of nonlinear systems while resulting in convex optimal control problems (OCPs) when used in (e)NMPC, thus striking a favorable balance between representational capacity and computational efficiency. By performing post-optimal sensitivity analysis on the resulting OCPs, we develop a method for constructing automatically-differentiable Koopman-based (e)NMPC policies, which can be optimized via the learnable parameters of the Koopman model. We optimize the (e)NMPC policies for specific control tasks using the state-of-the-art actor-critic RL algorithm Proximal Policy Optimization (PPO). Assuming the availability of full state measurements, we demonstrate the effectiveness of our method in NMPC (setpoint tracking) and eNMPC (demand response) case studies. These are based on (i) a small continuous stirred-tank reactor model with two differential states and two control inputs and (ii) an air separation unit with 119 differential states and approximately 2300 algebraic states. The results show that the proposed method performs favorably in terms of the control performance of the resulting policies compared to traditional benchmarks, including neural network policies trained using RL and Koopman-based eNMPC policies trained via system identification. Furthermore, we show that, in contrast to the neural network policies, the (e)NMPC policies can react to certain changes in the control setting without retraining. However, we observe (i) convergence problems resulting from inaccurate policy gradient estimates and (ii) low sample efficiency. To address the former issue, we exploit the automatic differentiability of training environments based on mechanistic simulation models to aid the policy optimization, resulting in substantially improved convergence and control performance. Furthermore, we improve the sample efficiency of the learning process by integrating our method for RL-based training of Koopman (e)NMPC policies with Dyna-style model-based RL. We also show that when leveraging model-based RL, the sample efficiency can be increased further by utilizing partial prior knowledge about the system dynamics via physics-informed model learning. In sum, this thesis contributes to the field of data-driven control and shows avenues toward higher-performance, real-time-capable, data-driven (e)NMPCs. %F PUB:(DE-HGF)11 ; PUB:(DE-HGF)3 %9 Dissertation / PhD ThesisBook %R 10.18154/RWTH-2025-09976 %U https://publications.rwth-aachen.de/record/1022363