Optimization of massive data applications on heterogeneous architectures

  1. Romero Moreno, José Carlos
Dirixida por:
  1. Rafael Asenjo Plaza Director
  2. Andrés Rodríguez Moreno Co-director

Universidade de defensa: Universidad de Málaga

Fecha de defensa: 15 de setembro de 2022

Tribunal:
  1. Nicolas Guil Matas Presidente/a
  2. José Carlos Cabaleiro Domínguez Secretario
  3. Jose Luis Nuñez Yañez Vogal

Tipo: Tese

Teseo: 745735 DIALNET lock_openRIUMA editor

Resumo

In the last few years, the heterogeneous architectures have become dominant in each part of the computing industry: from heterogeneous GPU accelerators joining multi-core CPUs within the same chip, to Systems on Chip that integrate DSPs or. The main motivation of this thesis is the fact that there is no implementation with optimal solution for heterogeneous architectures for two massive data, real-life and complex problems widely used in big data fields: Time Series and the Skyline problem. Firstly, we focus on the motifs/discord discovery problem for Time Series, taking as a starting point the state-of-the-art algorithm, the Matrix Profile. We present the first heterogeneous implementations for the Matrix Profile computation for CPU + GPU architectures and CPU + FPGA using a High Performance FPGA with integrated High Bandwidth Memory, HBM. We propose Fastfit, a hierarchical scheduler that efficiently balances workload among the FPGA and the CPU cores and computes an even partition so that all FPGA IPs complete their assignment at the same time. We validate the accuracy of our models, finding that it outperforms state-of-the-art previous schedulers by achieving up to 99.4% of ideal performance. Secondly, we tackle the problem of computing the Skyline operator over a stream of independent data queries targeting a heterogeneous CPU + GPU architecture. We contribute with a novel heterogeneous implementation, based on oneAPI, of the state-of-the-art SkyAlign algorithm. We design a graph-based engine, SkyFlow, and propose two heterogeneous approaches for Skyline computation over a stream of data queries: the first keeps two Skyline computations in parallel, one per device, and the second splits a single Skyline computation between the CPU and GPU. Our experimental results show that, our heterogeneous CPU+GPU approaches always outperform only-CPU and only-GPU state-of-the-art implementations up to 6.86x and 5.19x, respectively, and they fall below 6% of ideal peak performance.