Are Nearby Neighbors Relatives?: Testing Deep Music Embeddings

Jaehun Kim; Julián Urbano; Cynthia C.S. Liem; Alan Hanjalic

doi:10.3389/fams.2019.00053

Are Nearby Neighbors Relatives? Testing Deep Music Embeddings

Jaehun Kim, Julián Urbano, Cynthia C.S. Liem, Alan Hanjalic

Research output: Contribution to journal › Article › Scientific › peer-review

4 Citations (Scopus)

76 Downloads (Pure)

Abstract

Deep neural networks have frequently been used to directly learn representations useful for a given task from raw input data. In terms of overall performance metrics, machine learning solutions employing deep representations frequently have been reported to greatly outperform those using hand-crafted feature representations. At the same time, they may pick up on aspects that are predominant in the data, yet not actually meaningful or interpretable. In this paper, we therefore propose a systematic way to test the trustworthiness of deep music representations, considering musical semantics. The underlying assumption is that in case a deep representation is to be trusted, distance consistency between known related points should be maintained both in the input audio space and corresponding latent deep space. We generate known related points through semantically meaningful transformations, both considering imperceptible and graver transformations. Then, we examine within- and between-space distance consistencies, both considering audio space and latent embedded space, the latter either being a result of a conventional feature extractor or a deep encoder. We illustrate how our method, as a complement to task-specific performance, provides interpretable insight into what a network may have captured from training data signals.

Original language	English
Article number	53
Pages (from-to)	1-17
Number of pages	17
Journal	Frontiers in Applied Mathematics an Statistics
Volume	5
DOIs	https://doi.org/10.3389/fams.2019.00053
Publication status	Published - 2019

Keywords

MFCC
evaluation
music information retrieval
neural network
representation learning

Access to Document

10.3389/fams.2019.00053

fams-05-00053Final published version, 3.61 MBLicence: CC BY

Cite this

@article{a119583d7c814ef7996695d74905bd49,

title = "Are Nearby Neighbors Relatives?: Testing Deep Music Embeddings",

abstract = "Deep neural networks have frequently been used to directly learn representations useful for a given task from raw input data. In terms of overall performance metrics, machine learning solutions employing deep representations frequently have been reported to greatly outperform those using hand-crafted feature representations. At the same time, they may pick up on aspects that are predominant in the data, yet not actually meaningful or interpretable. In this paper, we therefore propose a systematic way to test the trustworthiness of deep music representations, considering musical semantics. The underlying assumption is that in case a deep representation is to be trusted, distance consistency between known related points should be maintained both in the input audio space and corresponding latent deep space. We generate known related points through semantically meaningful transformations, both considering imperceptible and graver transformations. Then, we examine within- and between-space distance consistencies, both considering audio space and latent embedded space, the latter either being a result of a conventional feature extractor or a deep encoder. We illustrate how our method, as a complement to task-specific performance, provides interpretable insight into what a network may have captured from training data signals.",

keywords = "MFCC, evaluation, music information retrieval, neural network, representation learning",

author = "Jaehun Kim and Juli{\'a}n Urbano and Liem, {Cynthia C.S.} and Alan Hanjalic",

year = "2019",

doi = "10.3389/fams.2019.00053",

language = "English",

volume = "5",

pages = "1--17",

journal = "Frontiers in Applied Mathematics an Statistics",

issn = "2297-4687",

publisher = "Frontiers Media",

}

TY - JOUR

T1 - Are Nearby Neighbors Relatives?

T2 - Testing Deep Music Embeddings

AU - Kim, Jaehun

AU - Urbano, Julián

AU - Liem, Cynthia C.S.

AU - Hanjalic, Alan

PY - 2019

Y1 - 2019

N2 - Deep neural networks have frequently been used to directly learn representations useful for a given task from raw input data. In terms of overall performance metrics, machine learning solutions employing deep representations frequently have been reported to greatly outperform those using hand-crafted feature representations. At the same time, they may pick up on aspects that are predominant in the data, yet not actually meaningful or interpretable. In this paper, we therefore propose a systematic way to test the trustworthiness of deep music representations, considering musical semantics. The underlying assumption is that in case a deep representation is to be trusted, distance consistency between known related points should be maintained both in the input audio space and corresponding latent deep space. We generate known related points through semantically meaningful transformations, both considering imperceptible and graver transformations. Then, we examine within- and between-space distance consistencies, both considering audio space and latent embedded space, the latter either being a result of a conventional feature extractor or a deep encoder. We illustrate how our method, as a complement to task-specific performance, provides interpretable insight into what a network may have captured from training data signals.

AB - Deep neural networks have frequently been used to directly learn representations useful for a given task from raw input data. In terms of overall performance metrics, machine learning solutions employing deep representations frequently have been reported to greatly outperform those using hand-crafted feature representations. At the same time, they may pick up on aspects that are predominant in the data, yet not actually meaningful or interpretable. In this paper, we therefore propose a systematic way to test the trustworthiness of deep music representations, considering musical semantics. The underlying assumption is that in case a deep representation is to be trusted, distance consistency between known related points should be maintained both in the input audio space and corresponding latent deep space. We generate known related points through semantically meaningful transformations, both considering imperceptible and graver transformations. Then, we examine within- and between-space distance consistencies, both considering audio space and latent embedded space, the latter either being a result of a conventional feature extractor or a deep encoder. We illustrate how our method, as a complement to task-specific performance, provides interpretable insight into what a network may have captured from training data signals.

KW - MFCC

KW - evaluation

KW - music information retrieval

KW - neural network

KW - representation learning

UR - http://www.scopus.com/inward/record.url?scp=85077583064&partnerID=8YFLogxK

U2 - 10.3389/fams.2019.00053

DO - 10.3389/fams.2019.00053

M3 - Article

SN - 2297-4687

VL - 5

SP - 1

EP - 17

JO - Frontiers in Applied Mathematics an Statistics

JF - Frontiers in Applied Mathematics an Statistics

M1 - 53

ER -

Are Nearby Neighbors Relatives? Testing Deep Music Embeddings

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this