Standard

Describing Data Processing Pipelines in Scientific Publications for Big Data Injection. / Mesbah, Sepideh; Bozzon, Alessandro; Lofi, Christoph; Houben, Geert-Jan.

Workshop on Scholary Web Mining. Cambridge, 2017.

Research output: Scientific - peer-reviewConference contribution

Harvard

Mesbah, S, Bozzon, A, Lofi, C & Houben, G-J 2017, Describing Data Processing Pipelines in Scientific Publications for Big Data Injection. in Workshop on Scholary Web Mining. Cambridge, Workshop on Scholary Web Mining, Cambridge, United Kingdom, 10 February.

APA

Vancouver

Author

Mesbah, Sepideh; Bozzon, Alessandro; Lofi, Christoph; Houben, Geert-Jan / Describing Data Processing Pipelines in Scientific Publications for Big Data Injection.

Workshop on Scholary Web Mining. Cambridge, 2017.

Research output: Scientific - peer-reviewConference contribution

BibTeX

@inbook{71c4c6f4a5b54b28a4b5fe59b8adb674,
title = "Describing Data Processing Pipelines in Scientific Publications for Big Data Injection",
keywords = "Ontology, Digital Libraries",
author = "Sepideh Mesbah and Alessandro Bozzon and Christoph Lofi and Geert-Jan Houben",
year = "2017",
month = "2",
booktitle = "Workshop on Scholary Web Mining",

}

RIS

TY - CHAP

T1 - Describing Data Processing Pipelines in Scientific Publications for Big Data Injection

AU - Mesbah,Sepideh

AU - Bozzon,Alessandro

AU - Lofi,Christoph

AU - Houben,Geert-Jan

PY - 2017/2/10

Y1 - 2017/2/10

N2 - The rise of Big Data analytics has been a disruptive game changer for many application domains, allowing the integration into domain-specific applications and systems of insights and knowledge extracted from external big data sets. The effective ``injection'' of external Big Data demands an understanding of the properties of available data sets, and expertise on the available and most suitable methods for data collection, enrichment and analysis. A prominent knowledge source is scientific literature, where data processing pipelines are described, discussed, and evaluated. Such knowledge is however not readily accessible, due to its distributed and unstructured nature. In this paper, we propose a novel ontology aimed at modeling properties of data processing pipelines, and their related artifacts, as described in scientific publications. The ontology is the result of a requirement analysis that involved experts from both academia and industry. We showcase the effectiveness of our ontology by manually applying it to a collection of publications describing data processing methods.

AB - The rise of Big Data analytics has been a disruptive game changer for many application domains, allowing the integration into domain-specific applications and systems of insights and knowledge extracted from external big data sets. The effective ``injection'' of external Big Data demands an understanding of the properties of available data sets, and expertise on the available and most suitable methods for data collection, enrichment and analysis. A prominent knowledge source is scientific literature, where data processing pipelines are described, discussed, and evaluated. Such knowledge is however not readily accessible, due to its distributed and unstructured nature. In this paper, we propose a novel ontology aimed at modeling properties of data processing pipelines, and their related artifacts, as described in scientific publications. The ontology is the result of a requirement analysis that involved experts from both academia and industry. We showcase the effectiveness of our ontology by manually applying it to a collection of publications describing data processing methods.

KW - Ontology

KW - Digital Libraries

UR - https://ornlcda.github.io/SWM2017/

M3 - Conference contribution

BT - Workshop on Scholary Web Mining

ER -

ID: 18491789