DOI

Sparse Matrix Vector Multiplication (SpMV) is a key kernel in various domains, that is known to be difficult to parallelize efficiently due to the low spatial locality of data. This is problematic for computing large-scale SpMV due to limited cache sizes but also in achieving speedups through parallel execution. To address these issues, we present 1) sparstition, a novel partitioning scheme that enables computing SpMV without the need to do any major post-processing steps, and 2) a corresponding HLS-based hardware design that is able to perform large-scale SpMV efficiently. The design is pipelined so the matrix size is limited only by the size of the off-chip memory (DRAM) and not by the available on-chip memory (BRAMs). Our experimental results, performed on a ZedBoard, show that we achieve a computational throughput of up to 300 MFLOPS in single-precision and 108 MFLOPS in double-precision, an improvement of 2.6X on average compared to current state-of-the-art HLS results. Finally, we predict that sparstition can boost the computational throughput of HLS-based SpMV kernel to over 10 GFLOPS when using High Bandwidth Memories.

Original languageEnglish
Title of host publication2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP)
Subtitle of host publicationProceedings
PublisherIEEE
Pages51-58
Number of pages8
ISBN (Electronic)978-1-7281-1601-3
ISBN (Print) 978-1-7281-1602-0
DOIs
Publication statusPublished - 2019
Event30th IEEE International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2019 - New York, United States
Duration: 15 Jul 201917 Jul 2019

Conference

Conference30th IEEE International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2019
CountryUnited States
CityNew York
Period15/07/1917/07/19

    Research areas

  • Accelerator, FPGA, High-Level Synthesis, Partitioning, SMVM, SpMV

ID: 57397344