Characterising and Mitigating Aggregation-Bias in Crowdsourced Toxicity Annotations

Agathe Balayn, Panagiotis Mavridis, Alessandro Bozzon, Benjamin Timmermans, Zoltán Szlávik

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

3 Citations (Scopus)
156 Downloads (Pure)

Abstract

Training machine learning (ML) models for natural language processing usually requires large amount of data, often acquired through crowdsourcing. The way this data is collected and aggregated can have an effect on the outputs of the trained model such as ignoring the labels which differ from the majority. In this paper we investigate how label aggregation can bias the ML results towards certain data samples and propose a methodology to highlight and mitigate this bias. Although our work is applicable to any kind of label aggregation for data subject to multiple interpretations, we focus on the effects of the bias introduced by majority voting on toxicity prediction over sentences. Our preliminary results point out that we can mitigate the majority-bias and get increased prediction accuracy for the minority opinions if we take into account the different labels from annotators when training adapted models, rather than rely on the aggregated labels.
Original languageEnglish
Title of host publicationProceedings of the 1st Workshop on Subjectivity, Ambiguity and Disagreement in Crowdsourcing, and Short Paper Proceedings of the 1st Workshop on Disentangling the Relation Between Crowdsourcing and Bias Management
EditorsLora Aroyo, Anca Dumitrache, Praveen Paritosh, Alex Quinn, Chris Welty, Alessandro Checco, Gianluca Demartini, Ujwal Gadiraju, Cristina Sarasua
PublisherCEUR-WS
Pages67-71
Number of pages5
Volume2276
Publication statusPublished - 2018
Event1st Workshop on Subjectivity, Ambiguity and Disagreement in Crowdsourcing, and 1st Workshop on Disentangling the Relation Between Crowdsourcing and Bias Management - University of Zurich, Zurich, Switzerland
Duration: 5 Jul 20185 Jul 2018
https://sites.google.com/view/crowdbias

Publication series

NameCEUR Workshop Proceedings
Volume2276
ISSN (Electronic)1613-0073

Conference

Conference1st Workshop on Subjectivity, Ambiguity and Disagreement in Crowdsourcing, and 1st Workshop on Disentangling the Relation Between Crowdsourcing and Bias Management
Abbreviated titleSAD2018 CrowdBias2018
Country/TerritorySwitzerland
CityZurich
Period5/07/185/07/18
Internet address

Bibliographical note

Accepted Author Manuscript

Keywords

  • dataset bias
  • Machine Learning fairness
  • crowdsourcing
  • annotation aggregation

Fingerprint

Dive into the research topics of 'Characterising and Mitigating Aggregation-Bias in Crowdsourced Toxicity Annotations'. Together they form a unique fingerprint.

Cite this