Subspace clustering for complex data

Günnemann, Stephan; Seidl, Thomas
doi:4103
% IMPORTANT: The following is UTF-8 encoded.  This means that in the presence
% of non-ASCII characters, it will not work with BibTeX 0.99 or older.
% Instead, you should use an up-to-date BibTeX implementation like “bibtex8” or
% “biber”.

@PHDTHESIS{Gnnemann:82818,
      author       = {Günnemann, Stephan},
      othercontributors = {Seidl, Thomas},
      title        = {{S}ubspace clustering for complex data},
      address      = {Aachen},
      publisher    = {Publikationsserver der RWTH Aachen University},
      reportid     = {RWTH-CONV-143177},
      pages        = {III, 304, XXVII S. : graph. Darst.},
      year         = {2012},
      note         = {Aachen, Techn. Hochsch., Diss., 2012},
      abstract     = {The increasing potential of storage technologies and
                      information systems has opened the possibility to
                      conveniently and affordably gather large amounts of complex
                      data. Going beyond simple descriptions of objects by some
                      few characteristics, such data sources range from high
                      dimensional vector spaces over imperfect data containing
                      errors to network data describing relations between the
                      objects. Data Mining is the task of extracting previously
                      unknown and useful patterns from such data sources by using
                      automatic or semi-automatic algorithms. In this thesis, we
                      focus on the mining task of clustering, which aims at
                      grouping similar objects while separating dissimilar ones.
                      Since in today's applications usually many characteristics
                      for each object are recorded, one cannot expect to find
                      similar objects by considering all attributes together. In
                      contrast, valuable clusters are hidden in subspace
                      projections of the data. As a general solution to this
                      problem, the paradigm of subspace clustering has been
                      introduced, which aims at automatically determining for each
                      group of objects a set of relevant attributes these objects
                      are similar in. In this thesis, we introduce novel methods
                      for effective subspace clustering on various types of
                      complex data. Our methods tackle major open challenges for
                      clustering in subspace projections. We study the problem of
                      redundancy in subspace clustering results and propose models
                      whose solutions contain only non-redundant and, thus,
                      valuable clusters. Since different subspace projections
                      represent different views on the data, often several
                      groupings of the objects are reasonable. Thus, we propose
                      techniques that are not restricted to a single partitioning
                      of the objects but that enable the detection of multiple
                      clustering solutions. Besides tackling these challenges of
                      subspace clustering for the case of vector data, we study
                      the task of subspace clustering on two further data types:
                      imperfect data and network data in combination with vector
                      data. We propose integrated mining techniques directly
                      handling errors in the data and simultaneously mining
                      different information sources. In thorough experiments, we
                      demonstrate the strengths of our novel clustering
                      approaches. Overall, for the first time, meaningful subspace
                      clustering results can be obtained for these types of
                      complex data.},
      keywords     = {Data Mining (SWD) / Cluster-Analyse (SWD) / Cluster
                      <Datenanalyse> (SWD) / Wissensextraktion (SWD) /
                      Dichtebasiertes Clusterverfahren (SWD) / Algorithmus (SWD) /
                      Netzwerk (SWD) / Fehlende Daten (SWD) / Hochdimensionale
                      Daten (SWD)},
      cin          = {122510 / 120000},
      ddc          = {004},
      cid          = {$I:(DE-82)122510_20140620$ / $I:(DE-82)120000_20140620$},
      shelfmark    = {H.2.8 * I.5.3},
      typ          = {PUB:(DE-HGF)11},
      urn          = {urn:nbn:de:hbz:82-opus-41038},
      url          = {https://publications.rwth-aachen.de/record/82818},
}
h1

h2

h3

h4

h5

h6

RWTH

Kontakt

RWTH Publications

Allgemeines