Large-scale Author Verification: Temporal and Topical Influences

M. van Dam; C. Hauff

doi:10.1145/2600428.2609504

Large-scale Author Verification: Temporal and Topical Influences

M. van Dam, C. Hauff

Web Information Systems

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

9 Citations (Scopus)

Abstract

The task of author verification is concerned with the question whether or not someone is the author of a given piece of text. Algorithms that extract writing style features from texts are used to determine how close in style different documents are. Currently, evaluations of author verification algorithms are restricted to small-scale corpora with usually less than one hundred test cases. In this work, we present a methodology to derive a large-scale author verification corpus based on Wikipedia Talkpages. We create a corpus based on English Wikipedia which is significantly larger than existing corpora. We investigate two dimensions on this corpus which so far have not received sufficient attention: the influence of topic and the influence of time on author verification accuracy.

Original language	English
Title of host publication	Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval
Pages	1039-1042
Number of pages	4
DOIs	https://doi.org/10.1145/2600428.2609504
Publication status	Published - 2014
Event	SIGIR '14: 37th international ACM SIGIR conference on Research and development in information retrieval - Gold Coast, Australia Duration: 6 Jul 2014 → 11 Jul 2014

Conference

Conference	SIGIR '14: 37th international ACM SIGIR conference on Research and development in information retrieval
Country/Territory	Australia
City	Gold Coast
Period	6/07/14 → 11/07/14

Access to Document

10.1145/2600428.2609504

Cite this

@inproceedings{dcf13f4827374cf59fbded67f0eb81ee,

title = "Large-scale Author Verification: Temporal and Topical Influences",

abstract = "The task of author verification is concerned with the question whether or not someone is the author of a given piece of text. Algorithms that extract writing style features from texts are used to determine how close in style different documents are. Currently, evaluations of author verification algorithms are restricted to small-scale corpora with usually less than one hundred test cases. In this work, we present a methodology to derive a large-scale author verification corpus based on Wikipedia Talkpages. We create a corpus based on English Wikipedia which is significantly larger than existing corpora. We investigate two dimensions on this corpus which so far have not received sufficient attention: the influence of topic and the influence of time on author verification accuracy.",

author = "Dam, {M. van} and C. Hauff",

year = "2014",

doi = "10.1145/2600428.2609504",

language = "English",

isbn = "978-1-4503-2257-7",

pages = "1039--1042",

booktitle = "Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval",

note = "SIGIR '14: 37th international ACM SIGIR conference on Research and development in information retrieval ; Conference date: 06-07-2014 Through 11-07-2014",

}

Dam, MV & Hauff, C 2014, Large-scale Author Verification: Temporal and Topical Influences. in Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval. pp. 1039-1042, SIGIR '14: 37th international ACM SIGIR conference on Research and development in information retrieval, Gold Coast, Australia, 6/07/14. https://doi.org/10.1145/2600428.2609504

TY - GEN

T1 - Large-scale Author Verification: Temporal and Topical Influences

AU - Dam, M. van

AU - Hauff, C.

PY - 2014

Y1 - 2014

N2 - The task of author verification is concerned with the question whether or not someone is the author of a given piece of text. Algorithms that extract writing style features from texts are used to determine how close in style different documents are. Currently, evaluations of author verification algorithms are restricted to small-scale corpora with usually less than one hundred test cases. In this work, we present a methodology to derive a large-scale author verification corpus based on Wikipedia Talkpages. We create a corpus based on English Wikipedia which is significantly larger than existing corpora. We investigate two dimensions on this corpus which so far have not received sufficient attention: the influence of topic and the influence of time on author verification accuracy.

AB - The task of author verification is concerned with the question whether or not someone is the author of a given piece of text. Algorithms that extract writing style features from texts are used to determine how close in style different documents are. Currently, evaluations of author verification algorithms are restricted to small-scale corpora with usually less than one hundred test cases. In this work, we present a methodology to derive a large-scale author verification corpus based on Wikipedia Talkpages. We create a corpus based on English Wikipedia which is significantly larger than existing corpora. We investigate two dimensions on this corpus which so far have not received sufficient attention: the influence of topic and the influence of time on author verification accuracy.

U2 - 10.1145/2600428.2609504

DO - 10.1145/2600428.2609504

M3 - Conference contribution

SN - 978-1-4503-2257-7

SP - 1039

EP - 1042

BT - Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval

T2 - SIGIR '14: 37th international ACM SIGIR conference on Research and development in information retrieval

Y2 - 6 July 2014 through 11 July 2014

ER -

Large-scale Author Verification: Temporal and Topical Influences

Abstract

Conference

Access to Document

Fingerprint

Cite this