Automated experimentation, Bayesian statistics and machine learning for high-throughput bioprocess development

Helleckes, Laura Marie; Wiechert, Wolfgang; Matuszynska, Anna Barbara; Oldiges, Marco
doi:43927
% IMPORTANT: The following is UTF-8 encoded.  This means that in the presence
% of non-ASCII characters, it will not work with BibTeX 0.99 or older.
% Instead, you should use an up-to-date BibTeX implementation like “bibtex8” or
% “biber”.

@PHDTHESIS{Helleckes:999462,
      author       = {Helleckes, Laura Marie},
      othercontributors = {Oldiges, Marco and Matuszynska, Anna Barbara and Wiechert,
                          Wolfgang},
      title        = {{A}utomated experimentation, {B}ayesian statistics and
                      machine learning for high-throughput bioprocess development},
      school       = {RWTH Aachen University},
      type         = {Dissertation},
      address      = {Aachen},
      publisher    = {RWTH Aachen University},
      reportid     = {RWTH-2024-12118},
      pages        = {1 Online-Ressource : Illustrationen},
      year         = {2024},
      note         = {Veröffentlicht auf dem Publikationsserver der RWTH Aachen
                      University 2025; Dissertation, RWTH Aachen University, 2024},
      abstract     = {The transition to a sustainable, circular bioeconomy is
                      essential to tackle the socioecological crises of the 21st
                      century. Industrial biotechnology, a cornerstone of this
                      bioeconomy, leverages modern biofoundries that integrate
                      automation and high-throughput experimentation with the
                      Design-Build-Test-Learn (DBTL) cycle to streamline
                      bioprocess development. While advances in automated cloning
                      and genome editing have increased the availability of large
                      strain libraries for early-stage screening, several
                      limitations remain in the Test and Learn phases of DBTL.
                      This work combines automated experimentation, Bayesian
                      statistical modelling and machine learning to bridge the
                      remaining gaps towards autonomous bioprocess development.
                      This requires an $\textit{experiment-in-the-loop}$-approach,
                      where simulations are closely coupled with experiments on
                      automated microbioreactor platforms. Consequently, the
                      toolboxes for experimental workflow development and decision
                      making based on process models were extended in this thesis.
                      These improved tools were then applied to biotechnological
                      case studies, focusing on model-driven experimental design
                      and iterative screening. First, manual steps in microbial
                      screening, such as precultures in shake flasks, were
                      replaced by automated solutions. Existing automated
                      microbioreactor platforms were thus extended to enable
                      consecutive screening experiments without human
                      intervention. For example, an automated deep freezer was
                      seamlessly integrated, including the connection to the
                      existing process control infrastructure. Furthermore,
                      automated precultures and microtiter plate recycling were
                      achieved for the microbioreactor, leading to the
                      demonstration of a fully automated, iterative screening with
                      cutinase-secreting $\textit{Corynebacterium glutamicum}$
                      strains. With the gaps in automated experimentation closed,
                      the focus was shifted to high-throughput data analysis and
                      process modelling. A need was identified for the evaluation
                      of analytical calibration data, for example from
                      high-throughput enzymatic assays. This led to the
                      development of Bayesian calibration models for
                      biotechnological applications, which describe the
                      relationship between tested quantities and measured values,
                      including uncertainty. The open-source Python package
                      calibr8 was developed to help practitioners with little
                      programming experience to easily implement complex,
                      non-linear calibration models. It serves as a toolbox for
                      high-throughput analytical calibration, as well as a
                      starting point for advanced process models that account for
                      bias in measurement systems. Using calibration models as
                      likelihoods, Bayesian statistical models were developed to
                      represent the technical and biological parameters of a
                      screening process. For example, batch effects between
                      screening experiments were modelled to avoid a bias in the
                      final ranking of strains and conditions. The process models
                      were also used to derive key performance indicators with
                      uncertainties for decision making. In two application
                      studies, Bayesian hierarchical process models were combined
                      with Bayesian optimisation to efficiently design iterative
                      screening experiments. For example, the number of
                      experiments required to screen a strain library of
                      catalytically active $\textit{inclusion bodies}$ (CatIBs)
                      could be reduced by 25\%. At the same time, the
                      probabilistic approach to calibration and process modelling
                      allows to identify major sources of uncertainty. This was
                      exploited to guide workflow development, e.g. leading to a
                      reduction of the relative standard deviation in the
                      automated CatIB purification and assay procedures from
                      11.4\% to only 1.9\% over 42 replicates. Finally, modern
                      machine learning tools were used to develop process models
                      and experimental designs for applications with limited
                      process understanding. The potential of horizontal knowledge
                      transfer for process models was explored, using data from
                      historical processes to improve predictions for new
                      processes. For example, Gaussian processes, popular machine
                      learning models for small data sets, were combined with
                      $\textit{meta learning}$ and benchmarked using in silico
                      cell culture data. In a final step, the established
                      knowledge transfer models identified optimal experimental
                      designs to characterise the behaviour of an unseen process,
                      a procedure called $\textit{calibration design}$. In
                      conclusion, this work intensifies bioprocess screening by
                      improving autonomous workflows on automated microbioreactor
                      systems. The close interaction between experiment and model
                      is crucial to achieve this goal, as is harnessing the power
                      of laboratory automation, computational tools and
                      interdisciplinary research. Overall, this thesis paves the
                      way for autonomous DBTL cycles, which are essential for a
                      sustainable bioeconomy in the future.},
      cin          = {162610 / 160000 / 165230 / 420410 / 057700},
      ddc          = {570},
      cid          = {$I:(DE-82)162610_20140620$ / $I:(DE-82)160000_20140620$ /
                      $I:(DE-82)165230_20220204$ / $I:(DE-82)420410_20140620$ /
                      $I:(DE-82)057700_20231115$},
      typ          = {PUB:(DE-HGF)11},
      doi          = {10.18154/RWTH-2024-12118},
      url          = {https://publications.rwth-aachen.de/record/999462},
}
h1

h2

h3

h4

h5

h6

RWTH

Kontakt

RWTH Publications

Allgemeines