Building an ASR System for Mboshi Using A Cross-language Definition of Acoustic Units Approach

Odette Scharenborg; Patrick Ebel; Francesco Ciannella; Mark Hasegawa-Johnson; Najim Dehak

doi:10.21437/SLTU.2018-35

Building an ASR System for Mboshi Using A Cross-language Definition of Acoustic Units Approach

Odette Scharenborg, Patrick Ebel, Francesco Ciannella, Mark Hasegawa-Johnson, Najim Dehak

Multimedia Computing

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

130 Downloads (Pure)

Abstract

For many languages in the world, not enough (annotated) speech data is available to train an ASR system. Recently, we proposed a cross-language method for training an ASR system using linguistic knowledge and semi-supervised training. Here, we apply this approach to the low-resource language Mboshi. Using an ASR system trained on Dutch, Mboshi acoustic units were first created using cross-language initialization of the phoneme vectors in the output layer. Subsequently, this adapted system was retrained using Mboshi self-labels. Two training methods were investigated: retraining of only the output layer and retraining the full deep neural network (DNN). The resulting Mboshi system was analyzed by investigating per phoneme accuracies, phoneme confusions, and by visualizing the hidden layers of the DNNs prior to and following retraining with the self-labels. Results showed a fairly similar performance for the two training methods but a better phoneme representation for the fully retrained DNN.

Original language	English
Title of host publication	Proceedings of the 6th Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU)
Subtitle of host publication	29-31 August 2018, Gurugram, India
Place of Publication	New Delhi, India
Publisher	ISCA
Pages	167-171
Number of pages	5
DOIs	https://doi.org/10.21437/SLTU.2018-35
Publication status	Published - 2018
Event	6th Workshop on Spoken Language Technologies for Under-resourced Languages - New Delhi, India Duration: 29 Aug 2018 → 31 Aug 2018

Workshop

Workshop	6th Workshop on Spoken Language Technologies for Under-resourced Languages
Abbreviated title	SLTU
Country/Territory	India
City	New Delhi
Period	29/08/18 → 31/08/18

Keywords

Low-resource automatic speech recognition
Cross-language adaptation
n, Semi-supervised training

Access to Document

10.21437/SLTU.2018-35

OdetteFinal published version, 499 KB

Cite this

Scharenborg, O., Ebel, P., Ciannella, F., Hasegawa-Johnson, M., & Dehak, N. (2018). Building an ASR System for Mboshi Using A Cross-language Definition of Acoustic Units Approach. In Proceedings of the 6th Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU): 29-31 August 2018, Gurugram, India (pp. 167-171). ISCA. https://doi.org/10.21437/SLTU.2018-35

@inproceedings{fece6820ab8248d89cc2e010a0f645a5,

title = "Building an ASR System for Mboshi Using A Cross-language Definition of Acoustic Units Approach",

abstract = "For many languages in the world, not enough (annotated) speech data is available to train an ASR system. Recently, we proposed a cross-language method for training an ASR system using linguistic knowledge and semi-supervised training. Here, we apply this approach to the low-resource language Mboshi. Using an ASR system trained on Dutch, Mboshi acoustic units were first created using cross-language initialization of the phoneme vectors in the output layer. Subsequently, this adapted system was retrained using Mboshi self-labels. Two training methods were investigated: retraining of only the output layer and retraining the full deep neural network (DNN). The resulting Mboshi system was analyzed by investigating per phoneme accuracies, phoneme confusions, and by visualizing the hidden layers of the DNNs prior to and following retraining with the self-labels. Results showed a fairly similar performance for the two training methods but a better phoneme representation for the fully retrained DNN.",

keywords = "Low-resource automatic speech recognition, Cross-language adaptation, n, Semi-supervised training",

author = "Odette Scharenborg and Patrick Ebel and Francesco Ciannella and Mark Hasegawa-Johnson and Najim Dehak",

year = "2018",

doi = "10.21437/SLTU.2018-35",

language = "English",

pages = "167--171",

booktitle = "Proceedings of the 6th Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU)",

publisher = "ISCA",

note = "6th Workshop on Spoken Language Technologies for Under-resourced Languages, SLTU ; Conference date: 29-08-2018 Through 31-08-2018",

}

Scharenborg, O, Ebel, P, Ciannella, F, Hasegawa-Johnson, M & Dehak, N 2018, Building an ASR System for Mboshi Using A Cross-language Definition of Acoustic Units Approach. in Proceedings of the 6th Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU): 29-31 August 2018, Gurugram, India. ISCA, New Delhi, India, pp. 167-171, 6th Workshop on Spoken Language Technologies for Under-resourced Languages, New Delhi, India, 29/08/18. https://doi.org/10.21437/SLTU.2018-35

Building an ASR System for Mboshi Using A Cross-language Definition of Acoustic Units Approach. / Scharenborg, Odette; Ebel, Patrick; Ciannella, Francesco et al.
Proceedings of the 6th Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU): 29-31 August 2018, Gurugram, India. New Delhi, India: ISCA, 2018. p. 167-171.

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - Building an ASR System for Mboshi Using A Cross-language Definition of Acoustic Units Approach

AU - Scharenborg, Odette

AU - Ebel, Patrick

AU - Ciannella, Francesco

AU - Hasegawa-Johnson, Mark

AU - Dehak, Najim

PY - 2018

Y1 - 2018

N2 - For many languages in the world, not enough (annotated) speech data is available to train an ASR system. Recently, we proposed a cross-language method for training an ASR system using linguistic knowledge and semi-supervised training. Here, we apply this approach to the low-resource language Mboshi. Using an ASR system trained on Dutch, Mboshi acoustic units were first created using cross-language initialization of the phoneme vectors in the output layer. Subsequently, this adapted system was retrained using Mboshi self-labels. Two training methods were investigated: retraining of only the output layer and retraining the full deep neural network (DNN). The resulting Mboshi system was analyzed by investigating per phoneme accuracies, phoneme confusions, and by visualizing the hidden layers of the DNNs prior to and following retraining with the self-labels. Results showed a fairly similar performance for the two training methods but a better phoneme representation for the fully retrained DNN.

AB - For many languages in the world, not enough (annotated) speech data is available to train an ASR system. Recently, we proposed a cross-language method for training an ASR system using linguistic knowledge and semi-supervised training. Here, we apply this approach to the low-resource language Mboshi. Using an ASR system trained on Dutch, Mboshi acoustic units were first created using cross-language initialization of the phoneme vectors in the output layer. Subsequently, this adapted system was retrained using Mboshi self-labels. Two training methods were investigated: retraining of only the output layer and retraining the full deep neural network (DNN). The resulting Mboshi system was analyzed by investigating per phoneme accuracies, phoneme confusions, and by visualizing the hidden layers of the DNNs prior to and following retraining with the self-labels. Results showed a fairly similar performance for the two training methods but a better phoneme representation for the fully retrained DNN.

KW - Low-resource automatic speech recognition

KW - Cross-language adaptation

KW - n, Semi-supervised training

U2 - 10.21437/SLTU.2018-35

DO - 10.21437/SLTU.2018-35

M3 - Conference contribution

SP - 167

EP - 171

BT - Proceedings of the 6th Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU)

PB - ISCA

CY - New Delhi, India

T2 - 6th Workshop on Spoken Language Technologies for Under-resourced Languages

Y2 - 29 August 2018 through 31 August 2018

ER -

Building an ASR System for Mboshi Using A Cross-language Definition of Acoustic Units Approach

Abstract

Workshop

Keywords

Access to Document

Fingerprint

Cite this