Hardware Acceleration of Explainable Machine Learning using Tensor Processing Units

Zhixin Pan,Prabhat Mishra

DOI: https://doi.org/10.48550/arXiv.2103.11927

2021-03-22

Abstract:Machine learning (ML) is successful in achieving human-level performance in various fields. However, it lacks the ability to explain an outcome due to its black-box nature. While existing explainable ML is promising, almost all of these methods focus on formatting interpretability as an optimization problem. Such a mapping leads to numerous iterations of time-consuming complex computations, which limits their applicability in real-time applications. In this paper, we propose a novel framework for accelerating explainable ML using Tensor Processing Units (TPUs). The proposed framework exploits the synergy between matrix convolution and Fourier transform, and takes full advantage of TPU's natural ability in accelerating matrix computations. Specifically, this paper makes three important contributions. (1) To the best of our knowledge, our proposed work is the first attempt in enabling hardware acceleration of explainable ML using TPUs. (2) Our proposed approach is applicable across a wide variety of ML algorithms, and effective utilization of TPU-based acceleration can lead to real-time outcome interpretation. (3) Extensive experimental results demonstrate that our proposed approach can provide an order-of-magnitude speedup in both classification time (25x on average) and interpretation time (13x on average) compared to state-of-the-art techniques.

Machine Learning,Artificial Intelligence

What problem does this paper attempt to address?

The problems that this paper attempts to solve are the two main limitations faced by machine learning (ML) in practical applications: long running time and lack of transparency. Specifically, although machine - learning techniques have achieved success in multiple fields, due to its black - box nature, it is difficult to explain the prediction results of the model, which limits its use in application scenarios that require real - time responses or high security. For example, in safety - critical applications, users hope to be able to understand the reasons why the model makes specific decisions, in order to increase the trust in the model's predictions. In addition, although the existing explainable machine - learning methods have potential, most of them regard the explanation process as an optimization problem, resulting in a large number of iterative calculations, which makes them difficult to implement in real - time applications. In response to these problems, this paper proposes a hardware - acceleration framework based on tensor processing units (TPU) for accelerating explainable machine learning. This framework takes advantage of the synergy between matrix convolution and Fourier transform and fully utilizes the natural advantages of TPU in accelerating matrix operations. Through this method, the author aims to achieve rapid training of machine - learning models and result interpretation, thereby improving the applicability and transparency of the models, especially in real - time applications. Specific contributions include the first attempt to use TPU for hardware acceleration of explainable machine learning, the proposed acceleration method is applicable to multiple machine - learning algorithms, and the experimental results show that, compared with the existing technology, this method can significantly improve the speed of classification time and interpretation time.

Hardware Acceleration of Explainable Machine Learning using Tensor Processing Units

Accelerating MRI Reconstruction on TPUs

TPU as Cryptographic Accelerator

GPTPU: Accelerating Applications using Edge Tensor Processing Units

High-resolution imaging on TPUs

Heterogeneous Integration of In-Memory Analog Computing Architectures with Tensor Processing Units

Hardware-Enabled Efficient Data Processing with Tensor-Train Decomposition

Flex-TPU: A Flexible TPU with Runtime Reconfigurable Dataflow Architecture

Woodpecker-DL: Accelerating Deep Neural Networks via Hardware-Aware Multifaceted Optimizations

EPU: An Energy-Efficient Explainable AI Accelerator With Sparsity-Free Computation and Heat Map Compression/Pruning

High-Performance Tensor Learning Primitives Using GPU Tensor Cores

Exploration of TPUs for AI Applications

Simulation of quantum physics with Tensor Processing Units: brute-force computation of ground states and time evolution

H3D-Transformer: A Heterogeneous 3D (H3D) Computing Platform for Transformer Model Acceleration on Edge Devices

Hardware-Efficient Mixed-Precision CP Tensor Decomposition

Deep Learning on Edge TPUs

Perun: Secure Multi-Stakeholder Machine Learning Framework with GPU Support

High-Performance Tensor-Train Primitives Using GPU Tensor Cores

PyTPU: Migration of Python Code for Heterogenous Acceleration with Automated Test Generation

TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings

TPU Based Deep Learning Image Enhancement for Real-Time Point-of-Care Ultrasound