From intra-modal to inter-modal space: Multi-task learning of shared representations for cross-modal retrieval

Jaeyoung Choi, Martha Larson, Gerald Friedland, Alan Hanjalic

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

1 Citation (Scopus)

Abstract

Learning a robust shared representation space is critical for effective multimedia retrieval, and is increasingly important as multimodal data grows in volume and diversity. The labeled datasets necessary for learning such a space are limited in size and also in coverage of semantic concepts. These limitations constrain performance: a shared representation learned on one dataset may not generalize well to another. We address this issue by building on the insight that, given limited data, it is easier to optimize the semantic structure of a space within a modality, than across modalities. We propose a two-stage shared representation learning framework with intra-modal optimization and subsequent cross-modal transfer learning of semantic structure that produces a robust shared representation space. We integrate multi-task learning into each step, making it possible to leverage multiple datasets, annotated with different concepts, as if they were one large dataset. Large-scale systematic experiments demonstrate improvements over previously reported state-of-the-art methods on cross-modal retrieval tasks.

Original languageEnglish
Title of host publicationProceedings - 2019 IEEE 5th International Conference on Multimedia Big Data, BigMM 2019
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages1-10
Number of pages10
ISBN (Electronic)9781728155272
DOIs
Publication statusPublished - 1 Sept 2019
Event5th IEEE International Conference on Multimedia Big Data, BigMM 2019 - Singapore, Singapore
Duration: 11 Sept 201913 Sept 2019

Publication series

NameProceedings - 2019 IEEE 5th International Conference on Multimedia Big Data, BigMM 2019

Conference

Conference5th IEEE International Conference on Multimedia Big Data, BigMM 2019
Country/TerritorySingapore
CitySingapore
Period11/09/1913/09/19

Keywords

  • Cross-modal retrieval
  • Image retrieval
  • Multi-task learning
  • Video retrieval

Fingerprint

Dive into the research topics of 'From intra-modal to inter-modal space: Multi-task learning of shared representations for cross-modal retrieval'. Together they form a unique fingerprint.

Cite this