The growing demand of processing power is being satisfied mainly by an increase in the number of homogeneous and heterogeneous computing cores in a system. Efficient utilization of these architectures demands analysis of memory-access behaviour of applications and perform data-communication aware mapping of applications on these architectures. Appropriate tools are required to highlight memory-access patterns and provide detailed intra-application data-communication information to assist developers in porting existing sequential applications efficiently to these architectures. In this work, we present the design of an open-source tool which provides such a detailed profile for C/C++ applications. In contrast to prior work, our tool not only reports detailed information, but also generates this information with manageable overheads for realistic workloads. Comparison with the state-of-the-art shows that the proposed profiler has, on the average, an order of magnitude less overhead as compared to the state-of-the-art data-communication profilers for a wide range of benchmarks. The experimental results show that our proposed tool generated profiling information for image processing applications which assisted in achieving a speed-up of 6.14× and 2.75× for heterogeneous multi-core platforms containing an FPGA and a GPU as accelerators, respectively.
Original languageEnglish
Pages (from-to)934-948
Number of pages15
JournalIEEE Transactions on Computers
Issue number7
Publication statusPublished - 2018

    Research areas

  • Tools, Computer architecture, Field programmable gate arrays, Graphics processing units, Instruments, Acceleration, Open source software

ID: 37936803