Standard

Data as a language : A novel approach to data integration. / Koutras, Christos.

2019. Abstract from 2019 International Conference on Very Large Database PhD Workshop, VLDB-PhD 2019, Los Angeles, United States.

Research output: Contribution to conferenceAbstractScientific

Harvard

Koutras, C 2019, 'Data as a language: A novel approach to data integration', 2019 International Conference on Very Large Database PhD Workshop, VLDB-PhD 2019, Los Angeles, United States, 26/08/19 - 30/08/19.

APA

Koutras, C. (2019). Data as a language: A novel approach to data integration. Abstract from 2019 International Conference on Very Large Database PhD Workshop, VLDB-PhD 2019, Los Angeles, United States.

Vancouver

Koutras C. Data as a language: A novel approach to data integration. 2019. Abstract from 2019 International Conference on Very Large Database PhD Workshop, VLDB-PhD 2019, Los Angeles, United States.

Author

Koutras, Christos. / Data as a language : A novel approach to data integration. Abstract from 2019 International Conference on Very Large Database PhD Workshop, VLDB-PhD 2019, Los Angeles, United States.4 p.

BibTeX

@conference{ff3b02d82e454498b2a971059cfb2fe9,
title = "Data as a language: A novel approach to data integration",
abstract = "In modern enterprises, both operational and organizational data is typically spread across multiple heterogeneous systems, databases and file systems. Recognizing the value of their data assets, companies and institutions construct data lakes, storing disparate datasets from dierent departments and systems. However, for those datasets to become useful, they need to be cleaned and integrated. Data can be well documented, structured and encoded in dierent schemata, but also unstructured with implicit, human-understandable semantics. Due to the sheer scale of the data itself but also the multitude of representations and schemata, data integration techniques need to scale without relying heavily on human labor. Existing integration approaches fail to address hidden semantics without human input or some form of ontology, making large scale integration a daunting task. The goal of my doctoral work is to devise scalable data integration methods, employing modern machine learning to exploit semantics and facilitate discovery of novel relationship types. In order to capture semantics with minimal human intervention, we propose a new approach which we call Data as a Language (DaaL). By leveraging embeddings from the Natural Language Processing (NLP) literature, DaaL aims at extracting semantics from structured and semi-structured data, allowing the exploration of relevance and similarity among dierent data sources. This paper discusses existing data integration mechanisms and elaborates on how NLP techniques can be used in data integration, alongside challenges and research directions.",
author = "Christos Koutras",
year = "2019",
language = "English",
note = "2019 International Conference on Very Large Database PhD Workshop, VLDB-PhD 2019 ; Conference date: 26-08-2019 Through 30-08-2019",

}

RIS

TY - CONF

T1 - Data as a language

T2 - 2019 International Conference on Very Large Database PhD Workshop, VLDB-PhD 2019

AU - Koutras, Christos

PY - 2019

Y1 - 2019

N2 - In modern enterprises, both operational and organizational data is typically spread across multiple heterogeneous systems, databases and file systems. Recognizing the value of their data assets, companies and institutions construct data lakes, storing disparate datasets from dierent departments and systems. However, for those datasets to become useful, they need to be cleaned and integrated. Data can be well documented, structured and encoded in dierent schemata, but also unstructured with implicit, human-understandable semantics. Due to the sheer scale of the data itself but also the multitude of representations and schemata, data integration techniques need to scale without relying heavily on human labor. Existing integration approaches fail to address hidden semantics without human input or some form of ontology, making large scale integration a daunting task. The goal of my doctoral work is to devise scalable data integration methods, employing modern machine learning to exploit semantics and facilitate discovery of novel relationship types. In order to capture semantics with minimal human intervention, we propose a new approach which we call Data as a Language (DaaL). By leveraging embeddings from the Natural Language Processing (NLP) literature, DaaL aims at extracting semantics from structured and semi-structured data, allowing the exploration of relevance and similarity among dierent data sources. This paper discusses existing data integration mechanisms and elaborates on how NLP techniques can be used in data integration, alongside challenges and research directions.

AB - In modern enterprises, both operational and organizational data is typically spread across multiple heterogeneous systems, databases and file systems. Recognizing the value of their data assets, companies and institutions construct data lakes, storing disparate datasets from dierent departments and systems. However, for those datasets to become useful, they need to be cleaned and integrated. Data can be well documented, structured and encoded in dierent schemata, but also unstructured with implicit, human-understandable semantics. Due to the sheer scale of the data itself but also the multitude of representations and schemata, data integration techniques need to scale without relying heavily on human labor. Existing integration approaches fail to address hidden semantics without human input or some form of ontology, making large scale integration a daunting task. The goal of my doctoral work is to devise scalable data integration methods, employing modern machine learning to exploit semantics and facilitate discovery of novel relationship types. In order to capture semantics with minimal human intervention, we propose a new approach which we call Data as a Language (DaaL). By leveraging embeddings from the Natural Language Processing (NLP) literature, DaaL aims at extracting semantics from structured and semi-structured data, allowing the exploration of relevance and similarity among dierent data sources. This paper discusses existing data integration mechanisms and elaborates on how NLP techniques can be used in data integration, alongside challenges and research directions.

UR - http://www.scopus.com/inward/record.url?scp=85070913982&partnerID=8YFLogxK

M3 - Abstract

AN - SCOPUS:85070913982

Y2 - 26 August 2019 through 30 August 2019

ER -

ID: 57715614