Standard

A Coflow-based Co-optimization Framework for High-performance Data Analytics. / Cheng, Long; Wang, Ying; Pei, Yulong; Epema, Dick.

Proceedings - 46th International Conference on Parallel Processing, ICPP 2017. Los Alamitos, CA : IEEE Computer Society, 2017. p. 392-401.

Research output: Scientific - peer-reviewConference contribution

Harvard

Cheng, L, Wang, Y, Pei, Y & Epema, D 2017, A Coflow-based Co-optimization Framework for High-performance Data Analytics. in Proceedings - 46th International Conference on Parallel Processing, ICPP 2017. IEEE Computer Society, Los Alamitos, CA, pp. 392-401, ICPP 2017, Bristol, United Kingdom, 14/08/17. DOI: 10.1109/ICPP.2017.48

APA

Cheng, L., Wang, Y., Pei, Y., & Epema, D. (2017). A Coflow-based Co-optimization Framework for High-performance Data Analytics. In Proceedings - 46th International Conference on Parallel Processing, ICPP 2017 (pp. 392-401). Los Alamitos, CA: IEEE Computer Society. DOI: 10.1109/ICPP.2017.48

Vancouver

Cheng L, Wang Y, Pei Y, Epema D. A Coflow-based Co-optimization Framework for High-performance Data Analytics. In Proceedings - 46th International Conference on Parallel Processing, ICPP 2017. Los Alamitos, CA: IEEE Computer Society. 2017. p. 392-401. Available from, DOI: 10.1109/ICPP.2017.48

Author

Cheng, Long ; Wang, Ying ; Pei, Yulong ; Epema, Dick. / A Coflow-based Co-optimization Framework for High-performance Data Analytics. Proceedings - 46th International Conference on Parallel Processing, ICPP 2017. Los Alamitos, CA : IEEE Computer Society, 2017. pp. 392-401

BibTeX

@inbook{4ffef8f85ca34a47a321933a23ee0282,
title = "A Coflow-based Co-optimization Framework for High-performance Data Analytics",
abstract = "Efficient execution of distributed database operators such as joining and aggregating is critical for the performance of big data analytics. With the increase of the compute speedup of modern CPUs, reducing the networkcommunication time of these operators in large systems is becoming increasingly important, and also challenging current techniques. Significant performance improvements have been achieved by using state-of-the-art methods, such as reducing network traffic designed in the data management domain, and data flow scheduling in the data communications domain.However, the proposed techniques in both fields just view each other as a black box, and performance gains from a co-optimization perspective have not yet been explored. In this paper, based on current research in coflow scheduling,we propose a novel Coflow-based Co-optimization Framework(CCF), which can co-optimize application-level data movementand network-level data communications for distributed operators,and consequently contribute to their performance inlarge distributed environments. We present the detailed designand implementation of CCF, and conduct an experimentalevaluation of CCF using large-scale simulations on large datajoins. Our results demonstrate that CCF can always performfaster than current approaches on network communications inlarge-scale distributed scenarios.",
keywords = "big data, coflow scheduling, distributed joins, network communications, data-intensive applications",
author = "Long Cheng and Ying Wang and Yulong Pei and Dick Epema",
year = "2017",
doi = "10.1109/ICPP.2017.48",
pages = "392--401",
booktitle = "Proceedings - 46th International Conference on Parallel Processing, ICPP 2017",
publisher = "IEEE Computer Society",
address = "United States",

}

RIS

TY - CHAP

T1 - A Coflow-based Co-optimization Framework for High-performance Data Analytics

AU - Cheng,Long

AU - Wang,Ying

AU - Pei,Yulong

AU - Epema,Dick

PY - 2017

Y1 - 2017

N2 - Efficient execution of distributed database operators such as joining and aggregating is critical for the performance of big data analytics. With the increase of the compute speedup of modern CPUs, reducing the networkcommunication time of these operators in large systems is becoming increasingly important, and also challenging current techniques. Significant performance improvements have been achieved by using state-of-the-art methods, such as reducing network traffic designed in the data management domain, and data flow scheduling in the data communications domain.However, the proposed techniques in both fields just view each other as a black box, and performance gains from a co-optimization perspective have not yet been explored. In this paper, based on current research in coflow scheduling,we propose a novel Coflow-based Co-optimization Framework(CCF), which can co-optimize application-level data movementand network-level data communications for distributed operators,and consequently contribute to their performance inlarge distributed environments. We present the detailed designand implementation of CCF, and conduct an experimentalevaluation of CCF using large-scale simulations on large datajoins. Our results demonstrate that CCF can always performfaster than current approaches on network communications inlarge-scale distributed scenarios.

AB - Efficient execution of distributed database operators such as joining and aggregating is critical for the performance of big data analytics. With the increase of the compute speedup of modern CPUs, reducing the networkcommunication time of these operators in large systems is becoming increasingly important, and also challenging current techniques. Significant performance improvements have been achieved by using state-of-the-art methods, such as reducing network traffic designed in the data management domain, and data flow scheduling in the data communications domain.However, the proposed techniques in both fields just view each other as a black box, and performance gains from a co-optimization perspective have not yet been explored. In this paper, based on current research in coflow scheduling,we propose a novel Coflow-based Co-optimization Framework(CCF), which can co-optimize application-level data movementand network-level data communications for distributed operators,and consequently contribute to their performance inlarge distributed environments. We present the detailed designand implementation of CCF, and conduct an experimentalevaluation of CCF using large-scale simulations on large datajoins. Our results demonstrate that CCF can always performfaster than current approaches on network communications inlarge-scale distributed scenarios.

KW - big data

KW - coflow scheduling

KW - distributed joins

KW - network communications

KW - data-intensive applications

UR - http://resolver.tudelft.nl/uuid:4ffef8f8-5ca3-4a47-a321-933a23ee0282

U2 - 10.1109/ICPP.2017.48

DO - 10.1109/ICPP.2017.48

M3 - Conference contribution

SP - 392

EP - 401

BT - Proceedings - 46th International Conference on Parallel Processing, ICPP 2017

PB - IEEE Computer Society

ER -

ID: 29616351