Stochastic Simulation of Test Collections: Evaluation Scores

Julián Urbano, Thomas Nagler

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

11 Citations (Scopus)
90 Downloads (Pure)

Abstract

Part of Information Retrieval evaluation research is limited by the fact that we do not know the distributions of system effectiveness over the populations of topics and, by extension, their true mean scores. The workaround usually consists in resampling topics from an existing collection and approximating the statistics of interest with the observations made between random subsamples, as if one represented the population and the other a random sample. However, this methodology is clearly limited by the availability of data, the impossibility to control the properties of these data, and the fact that we do not really measure what we intend to. To overcome these limitations, we propose a method based on vine copulas for stochastic simulation of evaluation results where the true system distributions are known upfront. In the basic use case, it takes the scores from an existing collection to build a semi-parametric model representing the set of systems and the population of topics, which can then be used to make realistic simulations of the scores by the same systems but on random new topics. Our ability to simulate this kind of data not only removes the current limitations, but also offers new opportunities for research. As an example, we show the benefits of this approach in two sample applications replicating typical experiments found in the literature. We provide a full R package to simulate new data following the proposed method, which can also be used to fully reproduce the results in this paper.
Original languageEnglish
Title of host publicationSIGIR'18
Subtitle of host publicationThe 41st International ACM SIGIR Conference on Research and Development in Information Retrieval
Place of PublicationNew York, NY
PublisherAssociation for Computing Machinery (ACM)
Pages695-704
Number of pages10
ISBN (Electronic)978-1-4503-5657-2
DOIs
Publication statusPublished - 2018
EventSIGIR 2018: The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval - Ann Arbor, United States
Duration: 8 Jul 201812 Jul 2018
Conference number: 41st

Conference

ConferenceSIGIR 2018
Abbreviated titleSIGIR'18
Country/TerritoryUnited States
CityAnn Arbor
Period8/07/1812/07/18

Bibliographical note

Accepted author manuscript

Keywords

  • Evaluation
  • Test Collection
  • Simulation
  • Distribution
  • Copula

Fingerprint

Dive into the research topics of 'Stochastic Simulation of Test Collections: Evaluation Scores'. Together they form a unique fingerprint.

Cite this