A 65-nm 8T SRAM Compute-in-Memory Macro With Column ADCs for Processing Neural Networks

Chengshuo Yu,Taegeun Yoo,Kevin Tshun Chuan Chai,Tony Tae-Hyoung Kim,Bongjin Kim
DOI: https://doi.org/10.1109/jssc.2022.3162602
2022-01-01
Abstract:In this work, we present a novel 8T static random access memory (SRAM)-based compute-in-memory (CIM) macro for processing neural networks with high energy efficiency. The proposed 8T bitcell is free from disturb issues thanks to the decoupled read channels by adding two extra transistors to the standard 6T bitcell. A 128 <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="5.856ex" height="2.176ex" style="vertical-align: -0.338ex;" viewBox="0 -791.3 2521.5 936.9" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-74" x="0" y="0"></use> <use xlink:href="#MJMATHI-69" x="361" y="0"></use> <use xlink:href="#MJMATHI-6D" x="707" y="0"></use> <use xlink:href="#MJMATHI-65" x="1585" y="0"></use> <use xlink:href="#MJMATHI-73" x="2052" y="0"></use></g></svg></span> 128 8T SRAM array offers massively parallel binary multiply and accumulate (MAC) operations with 64 <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="5.856ex" height="2.176ex" style="vertical-align: -0.338ex;" viewBox="0 -791.3 2521.5 936.9" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-74" x="0" y="0"></use> <use xlink:href="#MJMATHI-69" x="361" y="0"></use> <use xlink:href="#MJMATHI-6D" x="707" y="0"></use> <use xlink:href="#MJMATHI-65" x="1585" y="0"></use> <use xlink:href="#MJMATHI-73" x="2052" y="0"></use></g></svg></span> binary inputs (0/1) and 64 <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="5.856ex" height="2.176ex" style="vertical-align: -0.338ex;" viewBox="0 -791.3 2521.5 936.9" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-74" x="0" y="0"></use> <use xlink:href="#MJMATHI-69" x="361" y="0"></use> <use xlink:href="#MJMATHI-6D" x="707" y="0"></use> <use xlink:href="#MJMATHI-65" x="1585" y="0"></use> <use xlink:href="#MJMATHI-73" x="2052" y="0"></use></g></svg></span> 128 binary weights (+1/–1). After parallel MAC operations, 128 column-based neurons generate 128 <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="5.856ex" height="2.176ex" style="vertical-align: -0.338ex;" viewBox="0 -791.3 2521.5 936.9" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-74" x="0" y="0"></use> <use xlink:href="#MJMATHI-69" x="361" y="0"></use> <use xlink:href="#MJMATHI-6D" x="707" y="0"></use> <use xlink:href="#MJMATHI-65" x="1585" y="0"></use> <use xlink:href="#MJMATHI-73" x="2052" y="0"></use></g></svg></span> 1–5 bit outputs in parallel. The proposed column-based neuron comprises 64 <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="5.856ex" height="2.176ex" style="vertical-align: -0.338ex;" viewBox="0 -791.3 2521.5 936.9" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-74" x="0" y="0"></use> <use xlink:href="#MJMATHI-69" x="361" y="0"></use> <use xlink:href="#MJMATHI-6D" x="707" y="0"></use> <use xlink:href="#MJMATHI-65" x="1585" y="0"></use> <use xlink:href="#MJMATHI-73" x="2052" y="0"></use></g></svg></span> bitcells for dot-product, 32 <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="5.856ex" height="2.176ex" style="vertical-align: -0.338ex;" viewBox="0 -791.3 2521.5 936.9" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-74" x="0" y="0"></use> <use xlink:href="#MJMATHI-69" x="361" y="0"></use> <use xlink:href="#MJMATHI-6D" x="707" y="0"></use> <use xlink:href="#MJMATHI-65" x="1585" y="0"></use> <use xlink:href="#MJMATHI-73" x="2052" y="0"></use></g></svg></span> bitcells for analog-to-digital converter (ADC), and 32 <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="5.856ex" height="2.176ex" style="vertical-align: -0.338ex;" viewBox="0 -791.3 2521.5 936.9" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-74" x="0" y="0"></use> <use xlink:href="#MJMATHI-69" x="361" y="0"></use> <use xlink:href="#MJMATHI-6D" x="707" y="0"></use> <use xlink:href="#MJMATHI-65" x="1585" y="0"></use> <use xlink:href="#MJMATHI-73" x="2052" y="0"></use></g></svg></span> bitcells for offset calibration. The column ADC with 32 <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="5.856ex" height="2.176ex" style="vertical-align: -0.338ex;" viewBox="0 -791.3 2521.5 936.9" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-74" x="0" y="0"></use> <use xlink:href="#MJMATHI-69" x="361" y="0"></use> <use xlink:href="#MJMATHI-6D" x="707" y="0"></use> <use xlink:href="#MJMATHI-65" x="1585" y="0"></use> <use xlink:href="#MJMATHI-73" x="2052" y="0"></use></g></svg></span> replica SRAM bitcells converts the analog MAC results (i.e., a differential read bitline (RBL/RBLb) voltage) to the 1–5 bit output code by sweeping their reference levels in 1–31 cycles (i.e., <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="2.854ex" height="2.676ex" style="vertical-align: -0.338ex;" viewBox="0 -1006.6 1228.8 1152.1" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMAIN-32" x="0" y="0"></use> <use transform="scale(0.707)" xlink:href="#MJMATHI-4E" x="707" y="557"></use></g></svg></span> –1 cycles for <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="2.064ex" height="2.176ex" style="vertical-align: -0.338ex;" viewBox="0 -791.3 888.5 936.9" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-4E" x="0" y="0"></use></g></svg></span> -bit ADC). The measured linearity results [differential nonlinearity (DNL) and integral nonlinearity (-NL)] are +0.314/–0.256 least significant bit (LSB) and + 0.27/–0.116 LSB, respectively, after offset calibration. The simulated image classification results are 96.37% for Mixed National Institute of Standards and Technology database (MNIST) using a multi-layer perceptron (MLP) with two hidden layers, 87.1%/82.66% for CIFAR-10 using VGG-like/ResNet-18 convolutional neural networks (CNNs), demonstrating slight accuracy degradations (0.67%–1.34%) compared with the software baseline. A test chip with a 16K 8T SRAM bitcell array is fabricated using a 65-nm process. The measured energy efficiency is 490–15.8 TOPS/W for 1–5 bit ADC resolution using 0.45-/0.8-V core supply.<svg xmlns="http://www.w3.org/2000/svg" style="display: none;"><defs id="MathJax_SVG_glyphs"><path stroke-width="1" id="MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path><path stroke-width="1" id="MJMATHI-69" d="M184 600Q184 624 203 642T247 661Q265 661 277 649T290 619Q290 596 270 577T226 557Q211 557 198 567T184 600ZM21 287Q21 295 30 318T54 369T98 420T158 442Q197 442 223 419T250 357Q250 340 236 301T196 196T154 83Q149 61 149 51Q149 26 166 26Q175 26 185 29T208 43T235 78T260 137Q263 149 265 151T282 153Q302 153 302 143Q302 135 293 112T268 61T223 11T161 -11Q129 -11 102 10T74 74Q74 91 79 106T122 220Q160 321 166 341T173 380Q173 404 156 404H154Q124 404 99 371T61 287Q60 286 59 284T58 281T56 279T53 278T49 278T41 278H27Q21 284 21 287Z"></path><path stroke-width="1" id="MJMATHI-6D" d="M21 287Q22 293 24 303T36 341T56 388T88 425T132 442T175 435T205 417T221 395T229 376L231 369Q231 367 232 367L243 378Q303 442 384 442Q401 442 415 440T441 433T460 423T475 411T485 398T493 385T497 373T500 364T502 357L510 367Q573 442 659 442Q713 442 746 415T780 336Q780 285 742 178T704 50Q705 36 709 31T724 26Q752 26 776 56T815 138Q818 149 821 151T837 153Q857 153 857 145Q857 144 853 130Q845 101 831 73T785 17T716 -10Q669 -10 648 17T627 73Q627 92 663 193T700 345Q700 404 656 404H651Q565 404 506 303L499 291L466 157Q433 26 428 16Q415 -11 385 -11Q372 -11 364 -4T353 8T350 18Q350 29 384 161L420 307Q423 322 423 345Q423 404 379 404H374Q288 404 229 303L222 291L189 157Q156 26 151 16Q138 -11 108 -11Q95 -11 87 -5T76 7T74 17Q74 30 112 181Q151 335 151 342Q154 357 154 369Q154 405 129 405Q107 405 92 377T69 316T57 280Q55 278 41 278H27Q21 284 21 287Z"></path><path stroke-width="1" id="MJMATHI-65" d="M39 168Q39 225 58 272T107 350T174 402T244 433T307 442H310Q355 442 388 420T421 355Q421 265 310 237Q261 224 176 223Q139 223 138 221Q138 219 132 186T125 128Q125 81 146 54T209 26T302 45T394 111Q403 121 406 121Q410 121 419 112T429 98T420 82T390 55T344 24T281 -1T205 -11Q126 -11 83 42T39 168ZM373 353Q367 405 305 405Q272 405 244 391T199 357T170 316T154 280T149 261Q149 260 169 260Q282 260 327 284T373 353Z"></path><path stroke-width="1" id="MJMATHI-73" d="M131 289Q131 321 147 354T203 415T300 442Q362 442 390 415T419 355Q419 323 402 308T364 292Q351 292 340 300T328 326Q328 342 337 354T354 372T367 378Q368 378 368 379Q368 382 361 388T336 399T297 405Q249 405 227 379T204 326Q204 301 223 291T278 274T330 259Q396 230 396 163Q396 135 385 107T352 51T289 7T195 -10Q118 -10 86 19T53 87Q53 126 74 143T118 160Q133 160 146 151T160 120Q160 94 142 76T111 58Q109 57 108 57T107 55Q108 52 115 47T146 34T201 27Q237 27 263 38T301 66T318 97T323 122Q323 150 302 164T254 181T195 196T148 231Q131 256 131 289Z"></path><path stroke-width="1" id="MJMAIN-32" d="M109 429Q82 429 66 447T50 491Q50 562 103 614T235 666Q326 666 387 610T449 465Q449 422 429 383T381 315T301 241Q265 210 201 149L142 93L218 92Q375 92 385 97Q392 99 409 186V189H449V186Q448 183 436 95T421 3V0H50V19V31Q50 38 56 46T86 81Q115 113 136 137Q145 147 170 174T204 211T233 244T261 278T284 308T305 340T320 369T333 401T340 431T343 464Q343 527 309 573T212 619Q179 619 154 602T119 569T109 550Q109 549 114 549Q132 549 151 535T170 489Q170 464 154 447T109 429Z"></path><path stroke-width="1" id="MJMATHI-4E" d="M234 637Q231 637 226 637Q201 637 196 638T191 649Q191 676 202 682Q204 683 299 683Q376 683 387 683T401 677Q612 181 616 168L670 381Q723 592 723 606Q723 633 659 637Q635 637 635 648Q635 650 637 660Q641 676 643 679T653 683Q656 683 684 682T767 680Q817 680 843 681T873 682Q888 682 888 672Q888 650 880 642Q878 637 858 637Q787 633 769 597L620 7Q618 0 599 0Q585 0 582 2Q579 5 453 305L326 604L261 344Q196 88 196 79Q201 46 268 46H278Q284 41 284 38T282 19Q278 6 272 0H259Q228 2 151 2Q123 2 100 2T63 2T46 1Q31 1 31 10Q31 14 34 26T39 40Q41 46 62 46Q130 49 150 85Q154 91 221 362L289 634Q287 635 234 637Z"></path></defs></svg>
engineering, electrical & electronic
What problem does this paper attempt to address?