Large-scale data stream processing systems

Paris Carbone; Gábor E. Gévay; Gábor Hermann; Asterios Katsifodimos; Juan Soto; Volker Markl; Seif Haridi

doi:10.1007/978-3-319-49340-4_7

Large-scale data stream processing systems

Paris Carbone^*, Gábor E. Gévay, Gábor Hermann, Asterios Katsifodimos, Juan Soto, Volker Markl, Seif Haridi

^*Corresponding author for this work

Research output: Chapter in Book/Conference proceedings/Edited volume › Chapter › Scientific

9 Citations (Scopus)

Abstract

In our data-centric society, online services, decision making, and other aspects are increasingly becoming heavily dependent on trends and patterns extracted from data. A broad class of societal-scale data management problems requires system support for processing unbounded data with low latency and high throughput. Large-scale data stream processing systems perceive data as infinite streams and are designed to satisfy such requirements. They have further evolved substantially both in terms of expressive programming model support and also efficient and durable runtime execution on commodity clusters. Expressive programming models offer convenient ways to declare continuous data properties and applied computations, while hiding details on how these data streams are physically processed and orchestrated in a distributed environment. Execution engines provide a runtime for such models further allowing for scalable yet durable execution of any declared computation. In this chapter we introduce the major design aspects of large scale data stream processing systems, covering programming model abstraction levels and runtime concerns. We then present a detailed case study on stateful stream processing with Apache Flink, an open-source stream processor that is used for a wide variety of processing tasks. Finally, we address the main challenges of disruptive applications that large-scale data streaming enables from a systemic point of view.

Original language	English
Title of host publication	Handbook of Big Data Technologies
Editors	A.Y. Zomaya, S. Sherif
Place of Publication	Cham
Publisher	Springer
Pages	219-260
Number of pages	42
Edition	1
ISBN (Electronic)	978-3-319-49340-4
ISBN (Print)	978-3-319-49339-8
DOIs	https://doi.org/10.1007/978-3-319-49340-4_7
Publication status	Published - 25 Feb 2017
Externally published	Yes

Keywords

Harness

Access to Document

10.1007/978-3-319-49340-4_7

Cite this

@inbook{d1474663f41c43699fa417da3a51f210,

title = "Large-scale data stream processing systems",

abstract = "In our data-centric society, online services, decision making, and other aspects are increasingly becoming heavily dependent on trends and patterns extracted from data. A broad class of societal-scale data management problems requires system support for processing unbounded data with low latency and high throughput. Large-scale data stream processing systems perceive data as infinite streams and are designed to satisfy such requirements. They have further evolved substantially both in terms of expressive programming model support and also efficient and durable runtime execution on commodity clusters. Expressive programming models offer convenient ways to declare continuous data properties and applied computations, while hiding details on how these data streams are physically processed and orchestrated in a distributed environment. Execution engines provide a runtime for such models further allowing for scalable yet durable execution of any declared computation. In this chapter we introduce the major design aspects of large scale data stream processing systems, covering programming model abstraction levels and runtime concerns. We then present a detailed case study on stateful stream processing with Apache Flink, an open-source stream processor that is used for a wide variety of processing tasks. Finally, we address the main challenges of disruptive applications that large-scale data streaming enables from a systemic point of view.",

keywords = "Harness",

author = "Paris Carbone and G{\'e}vay, {G{\'a}bor E.} and G{\'a}bor Hermann and Asterios Katsifodimos and Juan Soto and Volker Markl and Seif Haridi",

year = "2017",

month = feb,

day = "25",

doi = "10.1007/978-3-319-49340-4_7",

language = "English",

isbn = "978-3-319-49339-8",

pages = "219--260",

editor = "A.Y. Zomaya and S. Sherif",

booktitle = "Handbook of Big Data Technologies",

publisher = "Springer",

edition = "1",

}

TY - CHAP

T1 - Large-scale data stream processing systems

AU - Carbone, Paris

AU - Gévay, Gábor E.

AU - Hermann, Gábor

AU - Katsifodimos, Asterios

AU - Soto, Juan

AU - Markl, Volker

AU - Haridi, Seif

PY - 2017/2/25

Y1 - 2017/2/25

N2 - In our data-centric society, online services, decision making, and other aspects are increasingly becoming heavily dependent on trends and patterns extracted from data. A broad class of societal-scale data management problems requires system support for processing unbounded data with low latency and high throughput. Large-scale data stream processing systems perceive data as infinite streams and are designed to satisfy such requirements. They have further evolved substantially both in terms of expressive programming model support and also efficient and durable runtime execution on commodity clusters. Expressive programming models offer convenient ways to declare continuous data properties and applied computations, while hiding details on how these data streams are physically processed and orchestrated in a distributed environment. Execution engines provide a runtime for such models further allowing for scalable yet durable execution of any declared computation. In this chapter we introduce the major design aspects of large scale data stream processing systems, covering programming model abstraction levels and runtime concerns. We then present a detailed case study on stateful stream processing with Apache Flink, an open-source stream processor that is used for a wide variety of processing tasks. Finally, we address the main challenges of disruptive applications that large-scale data streaming enables from a systemic point of view.

AB - In our data-centric society, online services, decision making, and other aspects are increasingly becoming heavily dependent on trends and patterns extracted from data. A broad class of societal-scale data management problems requires system support for processing unbounded data with low latency and high throughput. Large-scale data stream processing systems perceive data as infinite streams and are designed to satisfy such requirements. They have further evolved substantially both in terms of expressive programming model support and also efficient and durable runtime execution on commodity clusters. Expressive programming models offer convenient ways to declare continuous data properties and applied computations, while hiding details on how these data streams are physically processed and orchestrated in a distributed environment. Execution engines provide a runtime for such models further allowing for scalable yet durable execution of any declared computation. In this chapter we introduce the major design aspects of large scale data stream processing systems, covering programming model abstraction levels and runtime concerns. We then present a detailed case study on stateful stream processing with Apache Flink, an open-source stream processor that is used for a wide variety of processing tasks. Finally, we address the main challenges of disruptive applications that large-scale data streaming enables from a systemic point of view.

KW - Harness

UR - http://www.scopus.com/inward/record.url?scp=85019960984&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-49340-4_7

DO - 10.1007/978-3-319-49340-4_7

M3 - Chapter

AN - SCOPUS:85019960984

SN - 978-3-319-49339-8

SP - 219

EP - 260

BT - Handbook of Big Data Technologies

A2 - Zomaya, A.Y.

A2 - Sherif, S.

PB - Springer

CY - Cham

ER -

Large-scale data stream processing systems

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this