TY - THES AU - Frye, Maik TI - Recommending data preprocessing pipelines for machine learning applications in production; [korrigierte] 1. Auflage VL - 8/2023 PB - RWTH Aachen University VL - Dissertation CY - Aachen M1 - RWTH-2023-02364 T2 - Ergebnisse aus der Produktionstechnik SP - 1 Online-Ressource : Illustrationen, Diagramme PY - 2023 N1 - Druckausgabe: 2023. - Auch veröffentlicht auf dem Publikationsserver der RWTH Aachen University. - Ausgabe mit korrigierter Bandzählung. - Weiterer DOI: 10.18154/RWTH-2023-01401 N1 - Dissertation, RWTH Aachen University, 2022 AB - The era of Industry 4.0 opens up the possibility of optimizing production systems in a data-driven way. To turn data into value, machine learning (ML) models are trained on production data aiming at identifying patterns to optimize processes. A crucial prereq-uisite for achieving performant ML models is the availability of high quality data. Since raw data generated in production exhibits multiple quality issues, data preprocessing (DPP) is required to increase the quality of the data. One of the key design decisions in any ML project is the choice of suitable DPP methods. The search space further increases when DPP methods are configured into DPP pipelines. Due to the high num-ber of possible DPP pipelines, data scientists commonly select suitable pipelines man-ually and via trial and error. For these reasons, DPP nowadays accounts for approximately 80 LB - PUB:(DE-HGF)11 ; PUB:(DE-HGF)3 DO - DOI:10.18154/RWTH-2023-02364 UR - https://publications.rwth-aachen.de/record/952983 ER -