A benchmark and comparison of active learning for logistic regression

Yazhou Yang; Marco Loog

doi:10.1016/j.patcog.2018.06.004

A benchmark and comparison of active learning for logistic regression

Yazhou Yang^*, Marco Loog

^*Corresponding author for this work

Pattern Recognition and Bioinformatics

Research output: Contribution to journal › Article › Scientific › peer-review

91 Citations (Scopus)

109 Downloads (Pure)

Abstract

Logistic regression is by far the most widely used classifier in real-world applications. In this paper, we benchmark the state-of-the-art active learning methods for logistic regression and discuss and illustrate their underlying characteristics. Experiments are carried out on three synthetic datasets and 44 real-world datasets, providing insight into the behaviors of these active learning methods with respect to the area of the learning curve (which plots classification accuracy as a function of the number of queried examples) and their computational costs. Surprisingly, one of the earliest and simplest suggested active learning methods, i.e., uncertainty sampling, performs exceptionally well overall. Another remarkable finding is that random sampling, which is the rudimentary baseline to improve upon, is not overwhelmed by individual active learning techniques in many cases.

Original language	English
Pages (from-to)	401-415
Number of pages	15
Journal	Pattern Recognition
Volume	83
DOIs	https://doi.org/10.1016/j.patcog.2018.06.004
Publication status	Published - 2018

Bibliographical note

Accepted Author Manuscript

Keywords

Active learning
Benchmark
Experimental design
Logistic regression
Preference maps

Access to Document

10.1016/j.patcog.2018.06.004

47332005 - benchmark_ALAccepted author manuscript, 782 KBLicence: CC BY-NC-ND

Cite this

@article{8716e1c86c3b46178163eb7e2519a44f,

title = "A benchmark and comparison of active learning for logistic regression",

abstract = "Logistic regression is by far the most widely used classifier in real-world applications. In this paper, we benchmark the state-of-the-art active learning methods for logistic regression and discuss and illustrate their underlying characteristics. Experiments are carried out on three synthetic datasets and 44 real-world datasets, providing insight into the behaviors of these active learning methods with respect to the area of the learning curve (which plots classification accuracy as a function of the number of queried examples) and their computational costs. Surprisingly, one of the earliest and simplest suggested active learning methods, i.e., uncertainty sampling, performs exceptionally well overall. Another remarkable finding is that random sampling, which is the rudimentary baseline to improve upon, is not overwhelmed by individual active learning techniques in many cases.",

keywords = "Active learning, Benchmark, Experimental design, Logistic regression, Preference maps",

author = "Yazhou Yang and Marco Loog",

note = "Accepted Author Manuscript",

year = "2018",

doi = "10.1016/j.patcog.2018.06.004",

language = "English",

volume = "83",

pages = "401--415",

journal = "Pattern Recognition",

issn = "0031-3203",

publisher = "Elsevier",

}

TY - JOUR

T1 - A benchmark and comparison of active learning for logistic regression

AU - Yang, Yazhou

AU - Loog, Marco

N1 - Accepted Author Manuscript

PY - 2018

Y1 - 2018

N2 - Logistic regression is by far the most widely used classifier in real-world applications. In this paper, we benchmark the state-of-the-art active learning methods for logistic regression and discuss and illustrate their underlying characteristics. Experiments are carried out on three synthetic datasets and 44 real-world datasets, providing insight into the behaviors of these active learning methods with respect to the area of the learning curve (which plots classification accuracy as a function of the number of queried examples) and their computational costs. Surprisingly, one of the earliest and simplest suggested active learning methods, i.e., uncertainty sampling, performs exceptionally well overall. Another remarkable finding is that random sampling, which is the rudimentary baseline to improve upon, is not overwhelmed by individual active learning techniques in many cases.

AB - Logistic regression is by far the most widely used classifier in real-world applications. In this paper, we benchmark the state-of-the-art active learning methods for logistic regression and discuss and illustrate their underlying characteristics. Experiments are carried out on three synthetic datasets and 44 real-world datasets, providing insight into the behaviors of these active learning methods with respect to the area of the learning curve (which plots classification accuracy as a function of the number of queried examples) and their computational costs. Surprisingly, one of the earliest and simplest suggested active learning methods, i.e., uncertainty sampling, performs exceptionally well overall. Another remarkable finding is that random sampling, which is the rudimentary baseline to improve upon, is not overwhelmed by individual active learning techniques in many cases.

KW - Active learning

KW - Benchmark

KW - Experimental design

KW - Logistic regression

KW - Preference maps

UR - http://www.scopus.com/inward/record.url?scp=85048759279&partnerID=8YFLogxK

U2 - 10.1016/j.patcog.2018.06.004

DO - 10.1016/j.patcog.2018.06.004

M3 - Article

AN - SCOPUS:85048759279

SN - 0031-3203

VL - 83

SP - 401

EP - 415

JO - Pattern Recognition

JF - Pattern Recognition

ER -

A benchmark and comparison of active learning for logistic regression

Abstract

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this