Abstract:Web applications have increasingly adopted Deep Learning (DL) through in-browser inference, wherein DL inference performs directly within Web browsers. The actual performance of in-browser inference and its impacts on the quality of experience (QoE) remain unexplored, and urgently require new QoE measurements beyond traditional ones, e.g., mainly focusing on page load time. To bridge this gap, we make the first comprehensive performance measurement of in-browser inference to date. Our approach proposes new metrics to measure in-browser inference: responsiveness, smoothness, and inference accuracy. Our extensive analysis involves 9 representative DL models across Web browsers of 50 popular PC devices and 20 mobile devices. The results reveal that in-browser inference exhibits a substantial latency gap, averaging 16.9 times slower on CPU and 4.9 times slower on GPU compared to native inference on PC devices. The gap on mobile CPU and mobile GPU is 15.8 times and 7.8 times, respectively. Furthermore, we identify contributing factors to such latency gap, including underutilized hardware instruction sets, inherent overhead in the runtime environment, resource contention within the browser, and inefficiencies in software libraries and GPU abstractions. Additionally, in-browser inference imposes significant memory demands, at times exceeding 334.6 times the size of the DL models themselves, partly attributable to suboptimal memory management. We also observe that in-browser inference leads to a significant 67.2% increase in the time it takes for GUI components to render within Web browsers, significantly affecting the overall user QoE of Web applications reliant on this technology

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? The paper "Anatomizing Deep Learning Inference in Web Browsers" aims to solve the following key problems: 1. **Measurement and analysis of deep - learning inference performance within browsers**: - **Background**: As Web applications increasingly adopt deep - learning (DL) techniques, in - browser inference, that is, directly executing deep - learning model inference tasks in Web browsers, has become more and more common. However, the actual performance of in - browser inference and its impact on the quality of user experience (QoE) have not been fully studied. - **Problem**: Current research mainly focuses on deep - learning inference performance in local environments (such as latency, memory footprint, etc.), ignoring the uniqueness of the browser environment. The browser, as an intermediary, isolates Web applications from the underlying operating system, limiting the application's full use of hardware capabilities, resulting in significant differences between the browser environment and the local environment. 2. **Proposing new QoE metrics**: - **Background**: Existing Web QoE measurements mainly focus on traditional indicators such as page - loading time, but these indicators cannot comprehensively evaluate the impact of in - browser inference on the overall user experience. - **Problem**: In order to more comprehensively understand the impact of in - browser inference on user experience, new QoE metrics need to be developed, covering aspects such as responsiveness, fluency, and inference accuracy. 3. **Identifying factors affecting in - browser inference performance**: - **Background**: In - browser inference shows a significant latency gap. On average, on PC devices, CPU inference is 16.9 times slower than local inference, and GPU inference is 4.9 times slower; on mobile devices, CPU inference is 15.8 times slower than local inference, and GPU inference is 7.8 times slower. - **Problem**: It is necessary to identify the causes of these latency gaps, including insufficient utilization of hardware instruction sets, the inherent overhead of the runtime environment, internal resource contention within the browser, and the inefficiency of software libraries and GPU abstractions. 4. **Optimizing the performance and QoE of in - browser inference**: - **Background**: In - browser inference has a large memory requirement, sometimes even more than 334.6 times that of the model itself, and will cause the rendering time of GUI components to increase by 67.2%, seriously affecting the overall user experience of Web applications. - **Problem**: How to optimize the performance and QoE of in - browser inference to provide a better user experience and service quality. ### Summary By detailed analysis of the performance and QoE of in - browser inference, the paper proposes new metrics and explores the key factors affecting performance, and finally provides improvement suggestions for browser vendors, inference - framework developers, and Web - application developers to enhance the performance and user - experience quality of in - browser inference.

Anatomizing Deep Learning Inference in Web Browsers

Close the Gap Between Deep Learning and Mobile Intelligence by Incorporating Training in the Loop

Explore Training of Deep Convolutional Neural Networks on Battery-powered Mobile Devices: Design and Application

Deep Learning on Mobile and Embedded Devices: State-of-the-art, Challenges, and Future Directions

Moving Deep Learning into Web Browser: How Far Can We Go?

Empowering In-Browser Deep Learning Inference on Edge Through Just-In-Time Kernel Optimization.

Empowering In-Browser Deep Learning Inference on Edge Devices with Just-in-Time Kernel Optimizations

Accurate Deep Learning Inference Latency Prediction over Dynamic Running Mobile Devices

An architecture-level analysis on deep learning models for low-impact computations

Parallelizing DNN Inference in Mobile Web Browsers on Heterogeneous Hardware

Cloud-based or On-device: An Empirical Study of Mobile Deep Inference

Deep Learning Inference on Heterogeneous Mobile Processors: Potentials and Pitfalls

Toward Efficient Execution of Mainstream Deep Learning Frameworks on Mobile Devices: Architectural Implications

Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications

Benchmarking of DL Libraries and Models on Mobile Devices

Efficient Architecture Paradigm for Deep Learning Inference As a Service.

Automated Customization of On-Device Inference for Quality-of-Experience Enhancement

Woodpecker-DL: Accelerating Deep Neural Networks via Hardware-Aware Multifaceted Optimizations

Profiling and optimizing deep learning inference on mobile GPUs.

Exploring the Boundaries of On-Device Inference: When Tiny Falls Short, Go Hierarchical

Characterizing the Deep Neural Networks Inference Performance of Mobile Applications