Virtual patient modeling for heterogeneous intensive care unit data for the support of artificial intelligence

Sharafutdinov, Konstantin; Schuppert, Andreas; Mitsos, Alexander

doi:42252

TY - THES
AU - Sharafutdinov, Konstantin
TI - Virtual patient modeling for heterogeneous intensive care unit data for the support of artificial intelligence
PB - Rheinisch-Westfälische Technische Hochschule Aachen
VL - Dissertation
CY - Aachen
M1 - RWTH-2023-05200
SP - 1 Online-Ressource : Illustrationen, Diagramme
PY - 2023
N1 - Veröffentlicht auf dem Publikationsserver der RWTH Aachen University
N1 - Dissertation, Rheinisch-Westfälische Technische Hochschule Aachen, 2023
AB - Artificial intelligence (AI) and machine learning (ML) technologies have already shown their power and applicability in multiple areas of healthcare, including the intensive care unit (ICU). However, the limited generalizability of ML models developed on single-center datasets, and subsequently the impaired performance of such models in real-world settings, constitutes a significant constraint to the widespread adoption of data-based approaches in clinical practice. Furthermore, data structures and patient cohorts may significantly differ between hospitals introducing additional bias driven by data origin. These differences can be characterized as a dataset bias which is characteristic for data acquired in the ICU and represents a major challenge for the application of ML methods in the ICU setting. In our study we propose two frameworks to address the challenge of poor generalization of ML models. The first framework enables the quantitative assessment of dataset bias based on convex hull (CH) analysis and ML methods. It allows an a priori assessment of the generalizability of ML models in a new dataset based on the CH overlaps between a dataset used for model training and the new dataset. First, CH analysis is applied to find mean CH coverage between the two datasets based on overlaps of CH projections onto subspaces spanned by all combinations of 2 features providing an upper bound for the generalization ability of a ML model. Second, 4 types of ML models are trained to classify the origin of a dataset to assess whether it is possible to distinguish between patients from different hospitals. The performance of ML models is evaluated to determine whether hospital's datasets differ in terms of underlying data distributions. Combining the results of these 2 steps, a complete vision of potential generalization issues is obtained. The second contribution of our work is the development of a virtual patient (VP) modeling framework utilizing real-world ICU data pooled from different hospitals. VP models are computational models which simulate pathophysiological states. After being matched to real patient data, they allow to extract the core information describing a patient's status. Our VP modeling framework employs a mechanistic VP model of the cardiopulmonary system for data augmentation through identification of individualized model parameters approximating disease states of ICU patients. Parameters derived in the VP modeling framework are utilized as inputs for unsupervised ML methods which are used to characterize patient cohorts based on their mechanistic parametrization. Thus, a hybrid modeling framework for the analysis of large-scale ICU patient data is created. We show the advantages of this hybrid modeling framework in comparison to the direct utilization of original ICU data in ML algorithms. Thus, model-derived data can be utilized to reduce dataset bias and discover medically relevant patient subpopulations in heterogeneous ICU datasets. All in all, our novel frameworks integrating both mechanistic and data-driven models allow making a step towards utilization of available real-world ICU data from heterogeneous sources, which encompasses numerous benefits for healthcare.
LB - PUB:(DE-HGF)11
DO - DOI:10.18154/RWTH-2023-05200
UR - https://publications.rwth-aachen.de/record/958236
ER -

h1

h2

h3

h4

h5

h6

RWTH

Kontakt

RWTH Publications

Allgemeines