Semantic Annotation of Data Processing Pipelines in Scientific Publications

Sepideh Mesbah; Kyriakos Fragkeskos; Christoph Lofi; Alessandro Bozzon; Geert-Jan Houben

doi:10.1007/978-3-319-58068-5_20

Semantic Annotation of Data Processing Pipelines in Scientific Publications

Sepideh Mesbah, Kyriakos Fragkeskos, Christoph Lofi, Alessandro Bozzon, Geert-Jan Houben

Web Information Systems

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

13 Citations (Scopus)

Abstract

Data processing pipelines are a core object of interest for data scientist and practitioners operating in a variety of data-related application domains. To effectively capitalise on the experience gained in the creation and adoption of such pipelines, the need arises for mechanisms able to capture knowledge about datasets of interest, data processing methods designed to achieve a given goal, and the performance achieved when applying such methods to the considered datasets. However, due to its distributed and often unstructured nature, this knowledge is not easily accessible. In this paper, we use (scientific) publications as source of knowledge about Data Processing Pipelines. We describe a method designed to classify sentences according to the nature of the contained information (i.e. scientific objective, dataset, method, software, result), and to extract relevant named entities. The extracted information is then semantically annotated and published as linked data in open knowledge repositories according to the DMS ontology for data processing metadata. To demonstrate the effectiveness and performance of our approach, we present the results of a quantitative and qualitative analysis performed on four different conference series.

Original language	English
Title of host publication	The Semantic Web
Subtitle of host publication	14th International Conference, ESWC 2017, Proceedings Part 1
Editors	Eva Blomqvist, Diana Maynard, Aldo Gangemi, Rinke Hoekstra, Pascal Hitzler, Olaf Hartig
Place of Publication	Cham
Publisher	Springer
Pages	321-336
Number of pages	16
ISBN (Electronic)	978-3-319-58068-5
ISBN (Print)	978-3-319-58067-8
DOIs	https://doi.org/10.1007/978-3-319-58068-5_20
Publication status	Published - 16 May 2017
Event	Extended Semantic Web Conference - Portorož, Slovenia Duration: 28 May 2017 → 1 Jun 2017 Conference number: 14 http://2017.eswc-conferences.org/

Publication series

Name	Lecture Notes in Computer Science
Volume	10249
ISSN (Print)	0302-9743

Conference

Conference	Extended Semantic Web Conference
Abbreviated title	ESWC 2017
Country/Territory	Slovenia
City	Portorož
Period	28/05/17 → 1/06/17
Internet address	http://2017.eswc-conferences.org/

Access to Document

10.1007/978-3-319-58068-5_20

Cite this

Mesbah, S., Fragkeskos, K., Lofi, C., Bozzon, A., & Houben, G.-J. (2017). Semantic Annotation of Data Processing Pipelines in Scientific Publications. In E. Blomqvist, D. Maynard, A. Gangemi, R. Hoekstra, P. Hitzler, & O. Hartig (Eds.), The Semantic Web: 14th International Conference, ESWC 2017, Proceedings Part 1 (pp. 321-336). (Lecture Notes in Computer Science; Vol. 10249). Springer. https://doi.org/10.1007/978-3-319-58068-5_20

Mesbah, Sepideh ; Fragkeskos, Kyriakos ; Lofi, Christoph et al. / Semantic Annotation of Data Processing Pipelines in Scientific Publications. The Semantic Web: 14th International Conference, ESWC 2017, Proceedings Part 1. editor / Eva Blomqvist ; Diana Maynard ; Aldo Gangemi ; Rinke Hoekstra ; Pascal Hitzler ; Olaf Hartig. Cham : Springer, 2017. pp. 321-336 (Lecture Notes in Computer Science).

@inproceedings{0f278790e7f4469c911a541f63ff4e01,

title = "Semantic Annotation of Data Processing Pipelines in Scientific Publications",

abstract = "Data processing pipelines are a core object of interest for data scientist and practitioners operating in a variety of data-related application domains. To effectively capitalise on the experience gained in the creation and adoption of such pipelines, the need arises for mechanisms able to capture knowledge about datasets of interest, data processing methods designed to achieve a given goal, and the performance achieved when applying such methods to the considered datasets. However, due to its distributed and often unstructured nature, this knowledge is not easily accessible. In this paper, we use (scientific) publications as source of knowledge about Data Processing Pipelines. We describe a method designed to classify sentences according to the nature of the contained information (i.e. scientific objective, dataset, method, software, result), and to extract relevant named entities. The extracted information is then semantically annotated and published as linked data in open knowledge repositories according to the DMS ontology for data processing metadata. To demonstrate the effectiveness and performance of our approach, we present the results of a quantitative and qualitative analysis performed on four different conference series.",

author = "Sepideh Mesbah and Kyriakos Fragkeskos and Christoph Lofi and Alessandro Bozzon and Geert-Jan Houben",

year = "2017",

month = may,

day = "16",

doi = "10.1007/978-3-319-58068-5_20",

language = "English",

isbn = "978-3-319-58067-8",

series = "Lecture Notes in Computer Science",

publisher = "Springer",

pages = "321--336",

editor = "Eva Blomqvist and Diana Maynard and Aldo Gangemi and Rinke Hoekstra and Pascal Hitzler and Olaf Hartig",

booktitle = "The Semantic Web",

note = "Extended Semantic Web Conference, ESWC 2017 ; Conference date: 28-05-2017 Through 01-06-2017",

url = "http://2017.eswc-conferences.org/",

}

Mesbah, S, Fragkeskos, K, Lofi, C , Bozzon, A & Houben, G-J 2017, Semantic Annotation of Data Processing Pipelines in Scientific Publications. in E Blomqvist, D Maynard, A Gangemi, R Hoekstra, P Hitzler & O Hartig (eds), The Semantic Web: 14th International Conference, ESWC 2017, Proceedings Part 1. Lecture Notes in Computer Science, vol. 10249, Springer, Cham, pp. 321-336, Extended Semantic Web Conference, Portorož, Slovenia, 28/05/17. https://doi.org/10.1007/978-3-319-58068-5_20

Semantic Annotation of Data Processing Pipelines in Scientific Publications. / Mesbah, Sepideh; Fragkeskos, Kyriakos; Lofi, Christoph et al.
The Semantic Web: 14th International Conference, ESWC 2017, Proceedings Part 1. ed. / Eva Blomqvist; Diana Maynard; Aldo Gangemi; Rinke Hoekstra; Pascal Hitzler; Olaf Hartig. Cham: Springer, 2017. p. 321-336 (Lecture Notes in Computer Science; Vol. 10249).

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - Semantic Annotation of Data Processing Pipelines in Scientific Publications

AU - Mesbah, Sepideh

AU - Fragkeskos, Kyriakos

AU - Lofi, Christoph

AU - Bozzon, Alessandro

AU - Houben, Geert-Jan

N1 - Conference code: 14

PY - 2017/5/16

Y1 - 2017/5/16

N2 - Data processing pipelines are a core object of interest for data scientist and practitioners operating in a variety of data-related application domains. To effectively capitalise on the experience gained in the creation and adoption of such pipelines, the need arises for mechanisms able to capture knowledge about datasets of interest, data processing methods designed to achieve a given goal, and the performance achieved when applying such methods to the considered datasets. However, due to its distributed and often unstructured nature, this knowledge is not easily accessible. In this paper, we use (scientific) publications as source of knowledge about Data Processing Pipelines. We describe a method designed to classify sentences according to the nature of the contained information (i.e. scientific objective, dataset, method, software, result), and to extract relevant named entities. The extracted information is then semantically annotated and published as linked data in open knowledge repositories according to the DMS ontology for data processing metadata. To demonstrate the effectiveness and performance of our approach, we present the results of a quantitative and qualitative analysis performed on four different conference series.

AB - Data processing pipelines are a core object of interest for data scientist and practitioners operating in a variety of data-related application domains. To effectively capitalise on the experience gained in the creation and adoption of such pipelines, the need arises for mechanisms able to capture knowledge about datasets of interest, data processing methods designed to achieve a given goal, and the performance achieved when applying such methods to the considered datasets. However, due to its distributed and often unstructured nature, this knowledge is not easily accessible. In this paper, we use (scientific) publications as source of knowledge about Data Processing Pipelines. We describe a method designed to classify sentences according to the nature of the contained information (i.e. scientific objective, dataset, method, software, result), and to extract relevant named entities. The extracted information is then semantically annotated and published as linked data in open knowledge repositories according to the DMS ontology for data processing metadata. To demonstrate the effectiveness and performance of our approach, we present the results of a quantitative and qualitative analysis performed on four different conference series.

U2 - 10.1007/978-3-319-58068-5_20

DO - 10.1007/978-3-319-58068-5_20

M3 - Conference contribution

SN - 978-3-319-58067-8

T3 - Lecture Notes in Computer Science

SP - 321

EP - 336

BT - The Semantic Web

A2 - Blomqvist, Eva

A2 - Maynard, Diana

A2 - Gangemi, Aldo

A2 - Hoekstra, Rinke

A2 - Hitzler, Pascal

A2 - Hartig, Olaf

PB - Springer

CY - Cham

T2 - Extended Semantic Web Conference

Y2 - 28 May 2017 through 1 June 2017

ER -

Mesbah S, Fragkeskos K, Lofi C , Bozzon A , Houben GJ. Semantic Annotation of Data Processing Pipelines in Scientific Publications. In Blomqvist E, Maynard D, Gangemi A, Hoekstra R, Hitzler P, Hartig O, editors, The Semantic Web: 14th International Conference, ESWC 2017, Proceedings Part 1. Cham: Springer. 2017. p. 321-336. (Lecture Notes in Computer Science). doi: 10.1007/978-3-319-58068-5_20

Semantic Annotation of Data Processing Pipelines in Scientific Publications

Abstract

Publication series

Conference

Access to Document

Fingerprint

Cite this