High-dimensional logistic regression with fusion-type penalties

Kaufmann, Lea; Moustaki, Irini; Kamps, Udo; Kateri, Maria

doi:44254

%0 Thesis
%A Kaufmann, Lea
%T High-dimensional logistic regression with fusion-type penalties
%I RWTH Aachen University
%V Dissertation
%C Aachen
%M RWTH-2025-03939
%P 1 Online-Ressource : Illustrationen
%D 2025
%Z Veröffentlicht auf dem Publikationsserver der RWTH Aachen University
%Z Dissertation, RWTH Aachen University, 2025
%X In this thesis, penalized regression in the framework of logistic regression with categorical covariates (i.e. factors) is discussed. Providing an overview of existing penalized regression methods along with their characteristics, theoretical properties given in the literature for linear regression are transferred to the setting of logistic regression. First, the focus lies on penalized regression methods for levels fusion before those introduced for the purpose of factor selection are examined. Computational methods employed for obtaining the corresponding estimates by solving the resulting minimization problems are discussed. Finally, extensive simulation studies are conducted using the statistical software R, investigating the behavior of the presented methods in different simulation designs, showing the advantages and disadvantages of these methods. It turns out that there exists no penalty function so far, which simultaneously performs factor selection and levels fusion. To close this gap, a novel penalty function, called L0-Fused Group Lasso (L0-FGL) is introduced. The theoretical investigation of L0-FGL is obtained, showing valuable asymptotic properties. These properties justify that the new method is a suitable choice for the purpose of obtaining sparse models in penalized logistic regression with factors. Then, convenient algorithms to calculate the L0-FGL estimates are employed. The behavior ofL0-FGL is investigated in different simulation designs, showing that, on the one hand, L0-FGL is able to improve the factor selection performance of those penalties for levels fusion and, on the other hand, L0-FGL is able to perform both factor selection and levels fusion. Finally, statistical inference analysis for L0-FGL is provided. In particular, a two-stage method called two-stage L0-FGL is proposed, including a step for dimension reduction through factor selection and levels fusion, and an inferential step. Generally speaking, the two-stage method first reduces the dimension and, having that, those non-influential factors that are still included in the model are removed through statistical tests. Considering two different approaches for corrections for multiplicity of testing, a single and a multiple sample splitting approach is applied. Based on the asymptotic properties of L0-FGL, convenient asymptotic error control properties are shown for two-stage L0-FGL, yielding that this approach is a reasonable choice with a solid theoretical basis.
%F PUB:(DE-HGF)11
%9 Dissertation / PhD Thesis
%R 10.18154/RWTH-2025-03939
%U https://publications.rwth-aachen.de/record/1010220

h1

h2

h3

h4

h5

h6

RWTH

Kontakt

RWTH Publications

Allgemeines