Radial Graph Convolutional Network for Visual Question Generation

Xing Xu; Tan Wang; Yang Yang; Alan Hanjalic; Heng Tao Shen

doi:10.1109/TNNLS.2020.2986029

Radial Graph Convolutional Network for Visual Question Generation

Xing Xu, Tan Wang, Yang Yang, Alan Hanjalic, Heng Tao Shen

Intelligent Systems

Research output: Contribution to journal › Article › Scientific › peer-review

35 Citations (Scopus)

Abstract

In this article, we address the problem of visual question generation (VQG), a challenge in which a computer is required to generate meaningful questions about an image targeting a given answer. The existing approaches typically treat the VQG task as a reversed visual question answer (VQA) task, requiring the exhaustive match among all the image regions and the given answer. To reduce the complexity, we propose an innovative answer-centric approach termed radial graph convolutional network (Radial-GCN) to focus on the relevant image regions only. Our Radial-GCN method can quickly find the core answer area in an image by matching the latent answer with the semantic labels learned from all image regions. Then, a novel sparse graph of the radial structure is naturally built to capture the associations between the core node (i.e., answer area) and peripheral nodes (i.e., other areas); the graphic attention is subsequently adopted to steer the convolutional propagation toward potentially more relevant nodes for final question generation. Extensive experiments on three benchmark data sets show the superiority of our approach compared with the reference methods. Even in the unexplored challenging zero-shot VQA task, the synthesized questions by our method remarkably boost the performance of several state-of-the-art VQA methods from 0% to over 40%. The implementation code of our proposed method and the successfully generated questions are available at https://github.com/Wangt-CN/VQG-GCN.

Original language	English
Article number	9079208
Pages (from-to)	1654 - 1667
Number of pages	14
Journal	IEEE Transactions on Neural Networks and Learning Systems
Volume	32 (2021)
Issue number	4
DOIs	https://doi.org/10.1109/TNNLS.2020.2986029
Publication status	Published - 2020

Keywords

Cross-media understanding
graph convolutional network (GCN)
visual question generation (VQG).

Access to Document

10.1109/TNNLS.2020.2986029

Cite this

@article{b1ce36f46037473a9c13d61dd659e851,

title = "Radial Graph Convolutional Network for Visual Question Generation",

abstract = "In this article, we address the problem of visual question generation (VQG), a challenge in which a computer is required to generate meaningful questions about an image targeting a given answer. The existing approaches typically treat the VQG task as a reversed visual question answer (VQA) task, requiring the exhaustive match among all the image regions and the given answer. To reduce the complexity, we propose an innovative answer-centric approach termed radial graph convolutional network (Radial-GCN) to focus on the relevant image regions only. Our Radial-GCN method can quickly find the core answer area in an image by matching the latent answer with the semantic labels learned from all image regions. Then, a novel sparse graph of the radial structure is naturally built to capture the associations between the core node (i.e., answer area) and peripheral nodes (i.e., other areas); the graphic attention is subsequently adopted to steer the convolutional propagation toward potentially more relevant nodes for final question generation. Extensive experiments on three benchmark data sets show the superiority of our approach compared with the reference methods. Even in the unexplored challenging zero-shot VQA task, the synthesized questions by our method remarkably boost the performance of several state-of-the-art VQA methods from 0% to over 40%. The implementation code of our proposed method and the successfully generated questions are available at https://github.com/Wangt-CN/VQG-GCN.",

keywords = "Cross-media understanding, graph convolutional network (GCN), visual question generation (VQG).",

author = "Xing Xu and Tan Wang and Yang Yang and Alan Hanjalic and Shen, {Heng Tao}",

year = "2020",

doi = "10.1109/TNNLS.2020.2986029",

language = "English",

volume = "32 (2021)",

pages = "1654 -- 1667",

journal = "IEEE Transactions on Neural Networks and Learning Systems",

issn = "2162-237X",

publisher = "IEEE Computational Intelligence Society",

number = "4",

}

TY - JOUR

T1 - Radial Graph Convolutional Network for Visual Question Generation

AU - Xu, Xing

AU - Wang, Tan

AU - Yang, Yang

AU - Hanjalic, Alan

AU - Shen, Heng Tao

PY - 2020

Y1 - 2020

N2 - In this article, we address the problem of visual question generation (VQG), a challenge in which a computer is required to generate meaningful questions about an image targeting a given answer. The existing approaches typically treat the VQG task as a reversed visual question answer (VQA) task, requiring the exhaustive match among all the image regions and the given answer. To reduce the complexity, we propose an innovative answer-centric approach termed radial graph convolutional network (Radial-GCN) to focus on the relevant image regions only. Our Radial-GCN method can quickly find the core answer area in an image by matching the latent answer with the semantic labels learned from all image regions. Then, a novel sparse graph of the radial structure is naturally built to capture the associations between the core node (i.e., answer area) and peripheral nodes (i.e., other areas); the graphic attention is subsequently adopted to steer the convolutional propagation toward potentially more relevant nodes for final question generation. Extensive experiments on three benchmark data sets show the superiority of our approach compared with the reference methods. Even in the unexplored challenging zero-shot VQA task, the synthesized questions by our method remarkably boost the performance of several state-of-the-art VQA methods from 0% to over 40%. The implementation code of our proposed method and the successfully generated questions are available at https://github.com/Wangt-CN/VQG-GCN.

AB - In this article, we address the problem of visual question generation (VQG), a challenge in which a computer is required to generate meaningful questions about an image targeting a given answer. The existing approaches typically treat the VQG task as a reversed visual question answer (VQA) task, requiring the exhaustive match among all the image regions and the given answer. To reduce the complexity, we propose an innovative answer-centric approach termed radial graph convolutional network (Radial-GCN) to focus on the relevant image regions only. Our Radial-GCN method can quickly find the core answer area in an image by matching the latent answer with the semantic labels learned from all image regions. Then, a novel sparse graph of the radial structure is naturally built to capture the associations between the core node (i.e., answer area) and peripheral nodes (i.e., other areas); the graphic attention is subsequently adopted to steer the convolutional propagation toward potentially more relevant nodes for final question generation. Extensive experiments on three benchmark data sets show the superiority of our approach compared with the reference methods. Even in the unexplored challenging zero-shot VQA task, the synthesized questions by our method remarkably boost the performance of several state-of-the-art VQA methods from 0% to over 40%. The implementation code of our proposed method and the successfully generated questions are available at https://github.com/Wangt-CN/VQG-GCN.

KW - Cross-media understanding

KW - graph convolutional network (GCN)

KW - visual question generation (VQG).

UR - http://www.scopus.com/inward/record.url?scp=85084045827&partnerID=8YFLogxK

U2 - 10.1109/TNNLS.2020.2986029

DO - 10.1109/TNNLS.2020.2986029

M3 - Article

AN - SCOPUS:85084045827

SN - 2162-237X

VL - 32 (2021)

SP - 1654

EP - 1667

JO - IEEE Transactions on Neural Networks and Learning Systems

JF - IEEE Transactions on Neural Networks and Learning Systems

IS - 4

M1 - 9079208

ER -

Radial Graph Convolutional Network for Visual Question Generation

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this