In this paper, we investigate the connection between how people understand speech and how speech is understood by a deep neural network. A naïve, general feed-forward deep neural network was trained for the task of vowel/consonant classification. Subsequently, the representations of the speech signal in the different hidden layers of the DNN were visualized. The visualizations allow us to study the distance between the representations of different types of input frames and observe the clustering structures formed by these representations. In the different visualizations, the input frames were labeled with different linguistic categories: sounds in the same phoneme class, sounds with the same manner of articulation, and sounds with the same place of articulation. We investigate whether the DNN clusters speech representations in a way that corresponds to these linguistic categories and observe evidence that the DNN does indeed appear to learn structures that humans use to understand speech without being explicitly trained to do so.

Original languageEnglish
Title of host publicationMultiMedia Modeling
Subtitle of host publication25th International Conference, MMM 2019, Proceedings
EditorsIoannis Kompatsiaris, Benoit Huet, Vasileios Mezaris, Cathal Gurrin, Wen-Huang Cheng, Stefanos Vrochidis
Place of PublicationCham
PublisherSpringer Verlag
Pages194-205
Number of pages12
EditionPart II
ISBN (Electronic)978-3-030-05716-9
ISBN (Print)978-303005715-2
DOIs
Publication statusPublished - 2019
Event25th International Conference on MultiMedia Modeling, MMM 2019 - Thessaloniki, Greece
Duration: 8 Jan 201911 Jan 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11296 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference25th International Conference on MultiMedia Modeling, MMM 2019
CountryGreece
CityThessaloniki
Period8/01/1911/01/19

    Research areas

  • Deep neural networks, Speech representations, Visualizations

ID: 51320036