Abstract: In this paper, we investigate a joint task offloading, deep neural network (DNN) model pruning, and edge computing resource allocation (JOPA) problem for supporting a fault detection service ...
To effectively utilize heterogeneous specialized hardware units in modern GPUs, such as TensorCores and Tensor Memory Accelerators, this paper introduces PipeThreader, a new DNN compiler. PipeThreader ...
Project Brainwave is a deep learning platform for real-time AI inference in the cloud and on the edge. A soft Neural Processing Unit (NPU), based on a high-performance field-programmable gate array ...
Abstract: Recently, various pipeline parallelism strategies are proposed to tackle the scalability problem of training a large DNN model on a distributed system. However, most of the works focus on ...