Abstract
Comparing ontology matching systems are typically performed by comparing their average performances over multiple datasets. However, this paper examines the alignment systems using statistical inference since averaging is statistically unsafe and inappropriate. The statistical tests for comparison of two or multiple alignment systems are theoretically and empirically reviewed. For comparison of two systems, the Wilcoxon signed-rank and McNemar's mid-p and asymptotic tests are recommended due to their robustness and statistical safety in different circumstances. The Friedman and Quade tests with their corresponding post-hoc procedures are studied for comparison of multiple systems, and their [dis]advantages are discussed. The statistical methods are then applied to benchmark and multifarm tracks from the ontology matching evaluation initiative (OAEI) 2015 and their results are reported and visualized by critical difference diagrams.
Original language | English |
---|---|
Pages (from-to) | 1-14 |
Journal | IEEE Transactions on Knowledge and Data Engineering |
DOIs | |
Publication status | Published - 2018 |
Bibliographical note
Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Keywords
- Benchmark testing
- Bergmann
- Friedman
- Geoscience
- Holm
- McNemar
- Nemenyi
- Ontologies
- Ontology alignment evaluation
- paired t-test
- post-hoc
- Quade
- Robustness
- Shaffer
- Statistical analysis
- Task analysis
- Wilcoxon signed-rank