Efficient Binaural Sound Localization in Noisy and Reverberant Environments

Goeckel, Tom; Wagner, Hermann; Lakemeyer, Gerhard
doi:HT019089442
% IMPORTANT: The following is UTF-8 encoded.  This means that in the presence
% of non-ASCII characters, it will not work with BibTeX 0.99 or older.
% Instead, you should use an up-to-date BibTeX implementation like “bibtex8” or
% “biber”.

@PHDTHESIS{Goeckel:565602,
      author       = {Goeckel, Tom},
      othercontributors = {Lakemeyer, Gerhard and Wagner, Hermann},
      title        = {{E}fficient {B}inaural {S}ound {L}ocalization in {N}oisy
                      and {R}everberant {E}nvironments},
      school       = {RWTH Aachen University},
      type         = {Dissertation},
      address      = {Aachen},
      reportid     = {RWTH-2015-07917},
      pages        = {1 Online-Ressource (X, 139 Seiten) : Illustrationen,
                      Diagramme},
      year         = {2015},
      note         = {Veröffentlicht auf dem Publikationsserver der RWTH Aachen
                      University 2016; Dissertation, RWTH Aachen University, 2015},
      abstract     = {Localizing the origin of a sound source is a useful but
                      demanding task. It provides valuable information about both
                      the nature of the sound source and the environment it is
                      situated in, and it helps us to avoid dangers, improve
                      communication with other people, and get a better sense of
                      our surroundings. We aimed at providing robotic devices with
                      a similar sense of sound localization, with a focus on noisy
                      and reverberant conditions. We drew our inspiration from
                      barn owls, who are nocturnal hunters, and as such are
                      hearing specialists. Our localization system consisted of a
                      microphone mount with two receivers and the “binaural”
                      software to determine the direction of a sound source by
                      comparing both recorded signals. We determined the
                      difference of arrival times of the signal to infer the
                      direction of incidence of the signal. The model was based on
                      a basic cross-correlation algorithm that was extended by a
                      model of the precedence effect to reduce the impact of
                      reverberations and noise on the localization performance.
                      The model established a probability density function
                      containing the time differences of a single time window of
                      the input signal which were weighted by the amount of
                      interaural correlation at that sample. We added a time
                      integration step to further reduce noise in the output of
                      the algorithm which took both the energy of the signal and
                      the interaural correlation into account. Peaks in the final
                      distribution that covered the whole time difference range
                      were then considered as the most likely interaural time
                      differences and these time differences could then be
                      directly translated into an angle of incidence of sound
                      sources with respect to the position of a listening device.
                      We successfully tested the system in simulated environments
                      and in different office rooms. Even with additional
                      uncorrelated background noise the localization error was on
                      average below 1 degree in the horizontal plane. During the
                      development, we also put a strong emphasis on the efficiency
                      of our algorithm as the system should be able to run in
                      real-time. We added tracking of sources over longer periods
                      of time and signal detection to make it a more generally
                      applicable solution. In addition, we tested the system in
                      combination with a face recognition software to provide a
                      simple telepresence system that is able to follow an ongoing
                      discussion. This was evaluated with a setup containing a
                      simulated discussion between three speakers in an office
                      room. Another goal of this thesis was to determine the
                      amount of side peak suppression in a Jeffress-based
                      interaural time difference model as a function of bandwidth
                      and center frequency of the input signal. We were able to
                      predict side peak suppression for any Jeffress-based model
                      as a function of these parameters and could show that barn
                      owls exhibited a very similar side peak suppression pattern.
                      This led us to the conclusion that our assumed linear
                      frequency integration comes very close to the population
                      data of interaural time difference tuning curves that can be
                      measured in the external nucleus of the inferior colliculus
                      of the barn owl auditory midbrain. Thus, it is likely that
                      frequency integration in the auditory pathway of barn owls
                      is also a linear process.},
      cin          = {121920 / 120000},
      ddc          = {004},
      cid          = {$I:(DE-82)121920_20140620$ / $I:(DE-82)120000_20140620$},
      typ          = {PUB:(DE-HGF)11},
      urn          = {urn:nbn:de:hbz:82-rwth-2015-079175},
      doi          = {10.18154/RWTH-2015-07917},
      url          = {https://publications.rwth-aachen.de/record/565602},
}
h1

h2

h3

h4

h5

h6

RWTH

Kontakt

RWTH Publications

Allgemeines