Evaluating Neural Text Simplification in the Medical Domain

Laurens van den Bercken; Robert-Jan Sips; Christoph Lofi

doi:10.1145/3308558.3313630

Evaluating Neural Text Simplification in the Medical Domain

Laurens van den Bercken, Robert-Jan Sips, Christoph Lofi

Web Information Systems

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

45 Citations (Scopus)

1161 Downloads (Pure)

Abstract

Health literacy, i.e. the ability to read and understand medical text, is a relevant component of public health. Unfortunately, many medical texts are hard to grasp by the general population as they are targeted at highly-skilled professionals and use complex language and domain-specific terms. Here, automatic text simplification making text commonly understandable would be very beneficial. However, research and development into medical text simplification is hindered by the lack of openly available training and test corpora which contain complex medical sentences and their aligned simplified versions. In this paper, we introduce such a dataset to aid medical text simplification research. The dataset is created by filtering aligned health sentences using expert knowledge from an existing aligned corpus and a novel simple, language independent monolingual text alignment method. Furthermore, we use the dataset to train a state-of-the-art neural machine translation model, and compare it to a model trained on a general simplification dataset using an automatic evaluation, and an extensive human-expert evaluation.

Original language	English
Title of host publication	WWW'19 The World Wide Web Conference (WWW)
Place of Publication	New York
Publisher	Association for Computing Machinery (ACM)
Pages	3286-3292
Number of pages	7
ISBN (Print)	978-1-4503-6674-8/19/05
DOIs	https://doi.org/10.1145/3308558.3313630
Publication status	Published - May 2019
Event	WWW 2019 : The Web Conference 2019, 30 years of the web - San Francisco, CA, United States Duration: 13 May 2019 → 17 May 2019 Conference number: 30

Conference

Conference	WWW 2019
Abbreviated title	WWW'19
Country/Territory	United States
City	San Francisco, CA
Period	13/05/19 → 17/05/19

Keywords

Medical Text Simplification
Test and Training Data Generation
Monolingual Neural Machine Translation

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1145/3308558.3313630

p3286-berckenFinal published version, 454 KBLicence: CC BY

Cite this

@inproceedings{a9d4cbd3b7a849f181c6174864aa8aa6,

title = "Evaluating Neural Text Simplification in the Medical Domain",

abstract = "Health literacy, i.e. the ability to read and understand medical text, is a relevant component of public health. Unfortunately, many medical texts are hard to grasp by the general population as they are targeted at highly-skilled professionals and use complex language and domain-specific terms. Here, automatic text simplification making text commonly understandable would be very beneficial. However, research and development into medical text simplification is hindered by the lack of openly available training and test corpora which contain complex medical sentences and their aligned simplified versions. In this paper, we introduce such a dataset to aid medical text simplification research. The dataset is created by filtering aligned health sentences using expert knowledge from an existing aligned corpus and a novel simple, language independent monolingual text alignment method. Furthermore, we use the dataset to train a state-of-the-art neural machine translation model, and compare it to a model trained on a general simplification dataset using an automatic evaluation, and an extensive human-expert evaluation. ",

keywords = "Medical Text Simplification, Test and Training Data Generation, Monolingual Neural Machine Translation",

author = "{van den Bercken}, Laurens and Robert-Jan Sips and Christoph Lofi",

year = "2019",

month = may,

doi = "10.1145/3308558.3313630",

language = "English",

isbn = "978-1-4503-6674-8/19/05",

pages = "3286--3292",

booktitle = "WWW'19 The World Wide Web Conference (WWW)",

publisher = "Association for Computing Machinery (ACM)",

address = "United States",

note = "WWW 2019 : The Web Conference 2019, 30 years of the web, WWW'19 ; Conference date: 13-05-2019 Through 17-05-2019",

}

TY - GEN

T1 - Evaluating Neural Text Simplification in the Medical Domain

AU - van den Bercken, Laurens

AU - Sips, Robert-Jan

AU - Lofi, Christoph

N1 - Conference code: 30

PY - 2019/5

Y1 - 2019/5

N2 - Health literacy, i.e. the ability to read and understand medical text, is a relevant component of public health. Unfortunately, many medical texts are hard to grasp by the general population as they are targeted at highly-skilled professionals and use complex language and domain-specific terms. Here, automatic text simplification making text commonly understandable would be very beneficial. However, research and development into medical text simplification is hindered by the lack of openly available training and test corpora which contain complex medical sentences and their aligned simplified versions. In this paper, we introduce such a dataset to aid medical text simplification research. The dataset is created by filtering aligned health sentences using expert knowledge from an existing aligned corpus and a novel simple, language independent monolingual text alignment method. Furthermore, we use the dataset to train a state-of-the-art neural machine translation model, and compare it to a model trained on a general simplification dataset using an automatic evaluation, and an extensive human-expert evaluation.

AB - Health literacy, i.e. the ability to read and understand medical text, is a relevant component of public health. Unfortunately, many medical texts are hard to grasp by the general population as they are targeted at highly-skilled professionals and use complex language and domain-specific terms. Here, automatic text simplification making text commonly understandable would be very beneficial. However, research and development into medical text simplification is hindered by the lack of openly available training and test corpora which contain complex medical sentences and their aligned simplified versions. In this paper, we introduce such a dataset to aid medical text simplification research. The dataset is created by filtering aligned health sentences using expert knowledge from an existing aligned corpus and a novel simple, language independent monolingual text alignment method. Furthermore, we use the dataset to train a state-of-the-art neural machine translation model, and compare it to a model trained on a general simplification dataset using an automatic evaluation, and an extensive human-expert evaluation.

KW - Medical Text Simplification

KW - Test and Training Data Generation

KW - Monolingual Neural Machine Translation

U2 - 10.1145/3308558.3313630

DO - 10.1145/3308558.3313630

M3 - Conference contribution

SN - 978-1-4503-6674-8/19/05

SP - 3286

EP - 3292

BT - WWW'19 The World Wide Web Conference (WWW)

PB - Association for Computing Machinery (ACM)

CY - New York

T2 - WWW 2019

Y2 - 13 May 2019 through 17 May 2019

ER -

Evaluating Neural Text Simplification in the Medical Domain

Abstract

Conference

Keywords

UN SDGs

Access to Document

Fingerprint

Cite this