Traditional retrieval models such as BM25 or language models have been engineered based on search heuristics that later have been formalized into axioms. The axiomatic approach to information retrieval (IR) has shown that the effectiveness of a retrieval method is connected to its fulfillment of axioms. This approach enabled researchers to identify shortcomings in existing approaches and “fix” them. With the new wave of neural net based approaches to IR, a theoretical analysis of those retrieval models is no longer feasible, as they potentially contain millions of parameters. In this paper, we propose a pipeline to create diagnostic datasets for IR, each engineered to fulfill one axiom. We execute our pipeline on the recently released large-scale question answering dataset WikiPassageQA (which contains over 4000 topics) and create diagnostic datasets for four axioms. We empirically validate to what extent well-known deep IR models are able to realize the axiomatic pattern underlying the datasets. Our evaluation shows that there is indeed a positive relation between the performance of neural approaches on diagnostic datasets and their retrieval effectiveness. Based on these findings, we argue that diagnostic datasets grounded in axioms are a good approach to diagnosing neural IR models.

Original languageEnglish
Title of host publicationAdvances in Information Retrieval
Subtitle of host publication41st European Conference on IR Research, ECIR 2019, Proceedings Part 1
EditorsLeif Azzopardi, Benno Stein, Norbert Fuhr, Philipp Mayr, Claudia Hauff, Djoerd Hiemstra
Place of PublicationCham
PublisherSpringer Verlag
Pages489-503
Number of pages15
ISBN (Electronic)978-3-030-15712-8
ISBN (Print)978-3-030-15711-1
DOIs
Publication statusPublished - 2019
Event41st European Conference on Information Retrieval, ECIR 2019 - Cologne, Germany
Duration: 14 Apr 201918 Apr 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11437 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference41st European Conference on Information Retrieval, ECIR 2019
CountryGermany
CityCologne
Period14/04/1918/04/19

ID: 53627755