Abstract:Due to add operations dominated computation and simplified network in Binary Neural Network(BNN), it is promising for IOT scenarios, which demand ultra-low power consumption and hardware area overhead [1]. By means of exploiting the in-memory computing methods and high density of MLC STT-MRAM [2] [3], this work designs a MLC-STT-CIM(Computein-Memory) based computing in-memory architecture to achieve add operations for BNN, to further reduce power consumption and area overhead. With MLC STT-MRAM cell, we can store two bits in one cell and achieve add operation in one cell between the two bits. Compared to the design in [2] of which two addends are stored in two different bit-cells of the same column and both WLs are enabled to connected with NMOSs, this design reduces the capacity and energy consumption (see Fig.1: $\mathrm {I}_{SL}$ will be lower as it represents the current through one cell while in STT-CIM $\mathrm {I}_{SL}$ represents the sum of the current through two cells). Meanwhile, this design achieves lower read failure possibility because one read operation is enabled by only one WL. In this design (See Fig.1), there is a larger MTJ and a small MTJ in a cell to store two bits in one cell. The relative magnetic orientation of the free and reference layers determines the resistance offered by the MTJ. The resistance for the parallel configuration $\mathrm {R}_{P}$, is lower than the anti-parallel resistance $\mathrm {R}_{AP}$. $\mathrm {R}_{P}$ of small MTJ is lower than that of large MTJ, and so does $\mathrm {R}_{AP}$. We assume $\mathrm {R}_{AP}$ represents logic “0” and $\mathrm {R}_{P}$ represents logic “1”, then the resultant current $\mathrm {I}_{SL}$ flowing through the bit-cell is determined by the stored data in MTJs. If the first bit is stored in large MTJ and the second bit is stored in small MTJ, $\mathrm {I}_{SL}$ satisfy $\mathrm {I}_{00} < \mathrm {I}_{01} < \mathrm {I}_{10} < \mathrm {I}_{11}$ for $\mathrm {R}_{AP-AP} > \mathrm {R}_{AP-P} > \mathrm {R}_{AP-P} > \mathrm {R}_{P-P.}$ We can choose proper reference resistances to generate reference currents that satisfy $\mathrm {I}_{00} < \mathrm {I}_{ref3} < \mathrm {I}_{01} < \mathrm {I}_{ref2} < \mathrm {I}_{10} < \mathrm {I}_{ref1} < \mathrm {I}_{11.}$ Basing on the modified sensing circuit combined by sense amplifiers(SAs), MOSs and logic gates, we can realize both memory and computing mode. Memory mode: When WLs are enabled, we can write the bits to the cell with a two-step writing scheme: a large current is used to change the magnetic orientation of the large MTJ to write the first bit bit1.Because switching current of the small MTJ is smaller than the large one, the magnetic orientation of the small MTJ is changed as we write bit1.Basing on the second bit bit2, a small current is used to switch the small MTJ if necessary. To read the bits, we connect $\mathrm {I}_{SL}$ to the positive input of the sense amplifier SA1, SA2, and SA3 respectively. Connecting $\mathrm {I}_{ref2}$ to the negative input of SA1 to read the first bit, and the second bit signal will be sensed from the positive output of SA2 or SA3 by accordingly feeding $\mathrm {I}_{ref1}$ and $\mathrm {I}_{ref3}$ to their negative input. If the first bit is “1”(“0”), we will get the correct second bit from SA2(SA3). Respectively, connecting the positive output of SA2 and SA3 to D ports of NMOS and PMOS which are controlled by the first bit, and connecting the S ports to output bit2, the second bit is got. Computing mode: Except sensing the second bit in memory mode,SA2 can realize logic AND and NAND. As mentioned above, only I 11 is larger than $\mathrm {I}_{ref1.}$ In other words, only both MTJS are in the P configuration (both store logic “1”), leads to an output of logic “1” (“0”) at the positive (negative) output of SA2, while all other cases lead to logic “0” (“1”). Thus, the positive and negative outputs of SA2 evaluate the logic AND and NAND of the values stored in the enabled bit-cells. Obviously, A OR (NOR)operation can be realized at the positive (negative) terminal of SA3. And an XOR operation can be realized when feeding the AND output of SA2 and the NOR output of SA3 to a CMOS NOR gate. Suppose An and Bn(the n-th bits of two words, A and B) are stored in large MTJ and small MTJ in a cell within a MLC-STT-CIM array. Suppose that we wish to compute the full-adder logic function (the n-th stage of an adder that adds words A and B). According to Sn $=$ An XOR Bn XOR $\mathrm {C}_{n,}$ Cn $=(($ An XOR Bn) AND $\mathrm {C}_{n-1})$ OR (An AND Bn), the sum Sn and the carry out Cn Can be computed using An XOR Bn and An AND Bn, in addition to $\mathrm {C}_{n-1}($ carry input from the previous stage). We can see that ADD operation in terms of the outputs of bitwise operations, AND and XOR. Three additional logic gates are required to enable this computation. In this mode, the amounts of energy consumption for add operation is reduced within one cell, which can compute BNN efficiently.Meanwhile, since $\mathrm {I}_{SL}$ is smaller, the full-add operation is more reliable which benefits from lower sensing disturbing probability.

Circuit-level Design and Evaluation of STT-MRAM Based Binary Winner-Takes-All Network for Image Recognition

Binary Neural Network with 16 Mb Rram Macro Chip for Classification and Online Training

Area and Energy Efficient Short-Circuit-Logic-Based STT-MRAM Crossbar Array for Binary Neural Networks

Commodity Bit-Cell Sponsored MRAM Interaction Design for Binary Neural Network

Implementing Binarized Neural Networks with Magnetoresistive RAM without Error Correction

Binary Convolutional Neural Network on RRAM.

SOT-MRAM-based Binary Neural Networks Demonstration for Single Character Recognition

Parallelizing SRAM Arrays with Customized Bit-Cell for Binary Neural Networks

SOT-MRAM-Based Design for Energy-Efficient and Reliable Binary Neural Network Acceleration

Small Area and High Throughput Error Correction Module of STT-MRAM for Object Recognition Systems

Convolutional Neural Networks Based on RRAM Devices for Image Recognition and Online Learning Tasks

A Digitalized RRAM-based Spiking Neuron Network System with 3-Bit Weight and Unsupervised Online Learning Scheme

RRAM-based Analog-Weight Spiking Neural Network Accelerator with In-Situ Learning for IoT Applications

A Novel Two-Layer Memristive Spiking Neural Network with Spatio-Temporal Backpropagation

Competitive Neural Network Circuit Based on Winner-Take-All Mechanism and Online Hebbian Learning Rule.

Sign backpropagation: An on-chip learning algorithm for analog RRAM neuromorphic computing systems

Rram-Based Binary Neural Networks Using Back-Propagation Learning

In-MRAM Computing Elements with Single-Step Convolution and Fully Connected for BNN/TNN

A MLC STT-MRAM Based Computing In-Memory Architec-ture for Binary Neural Network

RRAM Based Convolutional Neural Networks for High Accuracy Pattern Recognition and Online Learning Tasks

Design of High Robustness BNN Inference Accelerator Based on Binary Memristors