Making accurate object detection at the edge: review and new approach

Zhenhua Huang,Shunzhi Yang,MengChu Zhou,Zheng Gong,Abdullah Abusorrah,Chen Lin,Zheng Huang
DOI: https://doi.org/10.1007/s10462-021-10059-3
IF: 9.588
2021-09-01
Artificial Intelligence Review
Abstract:<p class="a-plus-plus">With the development of Internet of Things (IoT), data are increasingly appearing at the edge of a network. Processing tasks at the network edge can effectively solve the problems of personal privacy leakage and server overloading. As a result, it has attracted a great deal of attention. A number of efficient convolutional neural network (CNN) models are proposed to do so. However, since they require much computing and memory resources, none of them can be deployed to such typical edge computing devices as Raspberry Pi 3B+ and 4B+ to meet the real-time requirements of user tasks. Considering that a traditional machine learning method can precisely locate an object with a highly acceptable calculation load, this work reviews state-of-the-art literature and then proposes a CNN with reduced input size for an object detection system that can be deployed in edge computing devices. It splits an object detection task into object positioning and classification. In particular, this work proposes a CNN model with 44 <span class="a-plus-plus inline-equation id-i-eq1"><span class="a-plus-plus equation-source format-t-e-x"><span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="1.808ex" height="1.509ex" style="vertical-align: 0.019ex; margin-bottom: -0.19ex;" viewBox="0 -576.1 778.5 649.8" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMAIN-D7" x="0" y="0"></use></g></svg></span></span></span> 44-pixel inputs instead of much more inputs, e.g., 224 <span class="a-plus-plus inline-equation id-i-eq2"><span class="a-plus-plus equation-source format-t-e-x"><span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="1.808ex" height="1.509ex" style="vertical-align: 0.019ex; margin-bottom: -0.19ex;" viewBox="0 -576.1 778.5 649.8" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMAIN-D7" x="0" y="0"></use></g></svg></span></span></span> 224-pixel in many existing methods, for edge computing devices with slow memory access and limited computing resources. Its overall performance has been verified via a facial expression detection system realized in Raspberry Pi 3B+ and 4B+. The work makes accurate object detection at the edge possible.</p><svg xmlns="http://www.w3.org/2000/svg" style="display: none;"><defs id="MathJax_SVG_glyphs"><path stroke-width="1" id="MJMAIN-D7" d="M630 29Q630 9 609 9Q604 9 587 25T493 118L389 222L284 117Q178 13 175 11Q171 9 168 9Q160 9 154 15T147 29Q147 36 161 51T255 146L359 250L255 354Q174 435 161 449T147 471Q147 480 153 485T168 490Q173 490 175 489Q178 487 284 383L389 278L493 382Q570 459 587 475T609 491Q630 491 630 471Q630 464 620 453T522 355L418 250L522 145Q606 61 618 48T630 29Z"></path></defs></svg>
computer science, artificial intelligence
What problem does this paper attempt to address?