The representation of speech and its processing in the human brain and deep neural networks

Odette Scharenborg*

*Corresponding author for this work

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

34 Downloads (Pure)

Abstract

For most languages in the world and for speech that deviates from the standard pronunciation, not enough (annotated) speech data is available to train an automatic speech recognition (ASR) system. Moreover, human intervention is needed to adapt an ASR system to a new language or type of speech. Human listeners, on the other hand, are able to quickly adapt to nonstandard speech and can learn the sound categories of a new language without having been explicitly taught to do so. In this paper, I will present comparisons between human speech processing and deep neural network (DNN)-based ASR and will argue that the cross-fertilisation of the two research fields can provide valuable information for the development of ASR systems that can flexibly adapt to any type of speech in any language. Specifically, I present results of several experiments carried out on both human listeners and DNN-based ASR systems on the representation of speech and lexically-guided perceptual learning, i.e., the ability to adapt a sound category on the basis of new incoming information resulting in improved processing of subsequent speech. The results showed that DNNs appear to learn structures that humans use to process speech without being explicitly trained to do so, and that, similar to humans, DNN systems learn speaker-adapted phone category boundaries from a few labelled examples. These results are the first steps towards building human-speech processing inspired ASR systems that, similar to human listeners, can adjust flexibly and fast to all kinds of new speech.

Original languageEnglish
Title of host publicationSpeech and Computer
Subtitle of host publication21st International Conference, SPECOM 2019, Proceedings
EditorsAlbert Ali Salah, Alexey Karpov, Rodmonga Potapova
Place of PublicationCham
PublisherSpringer
Pages1-8
Number of pages8
ISBN (Electronic)978-3-030-26061-3
ISBN (Print)978-3-030-26060-6
DOIs
Publication statusPublished - 2019
EventSPECOM 2019: The 21st International Conference on Speech and Computer - Istanbul, Turkey
Duration: 20 Aug 201925 Aug 2019
Conference number: 21st

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11658 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceSPECOM 2019
Country/TerritoryTurkey
CityIstanbul
Period20/08/1925/08/19

Keywords

  • Adaptation
  • Deep neural networks
  • Human speech processing
  • Non-standard speech
  • Perceptual learning
  • Speech representations

Fingerprint

Dive into the research topics of 'The representation of speech and its processing in the human brain and deep neural networks'. Together they form a unique fingerprint.

Cite this