A MLC STT-MRAM Based Computing In-Memory Architec-ture for Binary Neural Network
Y. Pan,P. Ouyang,Y. Zhao,W. Kang,S. Yin,Y. Zhang,W. Zhao,S. Wei
DOI: https://doi.org/10.1109/intmag.2018.8508764
2018-01-01
Abstract:Due to add operations dominated computation and simplified network in Binary Neural Network(BNN), it is promising for IOT scenarios, which demand ultra-low power consumption and hardware area overhead [1]. By means of exploiting the in-memory computing methods and high density of MLC STT-MRAM [2] [3], this work designs a MLC-STT-CIM(Computein-Memory) based computing in-memory architecture to achieve add operations for BNN, to further reduce power consumption and area overhead. With MLC STT-MRAM cell, we can store two bits in one cell and achieve add operation in one cell between the two bits. Compared to the design in [2] of which two addends are stored in two different bit-cells of the same column and both WLs are enabled to connected with NMOSs, this design reduces the capacity and energy consumption (see Fig.1: $\mathrm {I}_{SL}$ will be lower as it represents the current through one cell while in STT-CIM $\mathrm {I}_{SL}$ represents the sum of the current through two cells). Meanwhile, this design achieves lower read failure possibility because one read operation is enabled by only one WL. In this design (See Fig.1), there is a larger MTJ and a small MTJ in a cell to store two bits in one cell. The relative magnetic orientation of the free and reference layers determines the resistance offered by the MTJ. The resistance for the parallel configuration $\mathrm {R}_{P}$, is lower than the anti-parallel resistance $\mathrm {R}_{AP}$. $\mathrm {R}_{P}$ of small MTJ is lower than that of large MTJ, and so does $\mathrm {R}_{AP}$. We assume $\mathrm {R}_{AP}$ represents logic “0” and $\mathrm {R}_{P}$ represents logic “1”, then the resultant current $\mathrm {I}_{SL}$ flowing through the bit-cell is determined by the stored data in MTJs. If the first bit is stored in large MTJ and the second bit is stored in small MTJ, $\mathrm {I}_{SL}$ satisfy $\mathrm {I}_{00} < \mathrm {I}_{01} < \mathrm {I}_{10} < \mathrm {I}_{11}$ for $\mathrm {R}_{AP-AP} > \mathrm {R}_{AP-P} > \mathrm {R}_{AP-P} > \mathrm {R}_{P-P.}$ We can choose proper reference resistances to generate reference currents that satisfy $\mathrm {I}_{00} < \mathrm {I}_{ref3} < \mathrm {I}_{01} < \mathrm {I}_{ref2} < \mathrm {I}_{10} < \mathrm {I}_{ref1} < \mathrm {I}_{11.}$ Basing on the modified sensing circuit combined by sense amplifiers(SAs), MOSs and logic gates, we can realize both memory and computing mode. Memory mode: When WLs are enabled, we can write the bits to the cell with a two-step writing scheme: a large current is used to change the magnetic orientation of the large MTJ to write the first bit bit1.Because switching current of the small MTJ is smaller than the large one, the magnetic orientation of the small MTJ is changed as we write bit1.Basing on the second bit bit2, a small current is used to switch the small MTJ if necessary. To read the bits, we connect $\mathrm {I}_{SL}$ to the positive input of the sense amplifier SA1, SA2, and SA3 respectively. Connecting $\mathrm {I}_{ref2}$ to the negative input of SA1 to read the first bit, and the second bit signal will be sensed from the positive output of SA2 or SA3 by accordingly feeding $\mathrm {I}_{ref1}$ and $\mathrm {I}_{ref3}$ to their negative input. If the first bit is “1”(“0”), we will get the correct second bit from SA2(SA3). Respectively, connecting the positive output of SA2 and SA3 to D ports of NMOS and PMOS which are controlled by the first bit, and connecting the S ports to output bit2, the second bit is got. Computing mode: Except sensing the second bit in memory mode,SA2 can realize logic AND and NAND. As mentioned above, only I 11 is larger than $\mathrm {I}_{ref1.}$ In other words, only both MTJS are in the P configuration (both store logic “1”), leads to an output of logic “1” (“0”) at the positive (negative) output of SA2, while all other cases lead to logic “0” (“1”). Thus, the positive and negative outputs of SA2 evaluate the logic AND and NAND of the values stored in the enabled bit-cells. Obviously, A OR (NOR)operation can be realized at the positive (negative) terminal of SA3. And an XOR operation can be realized when feeding the AND output of SA2 and the NOR output of SA3 to a CMOS NOR gate. Suppose An and Bn(the n-th bits of two words, A and B) are stored in large MTJ and small MTJ in a cell within a MLC-STT-CIM array. Suppose that we wish to compute the full-adder logic function (the n-th stage of an adder that adds words A and B). According to Sn $=$ An XOR Bn XOR $\mathrm {C}_{n,}$ Cn $=(($ An XOR Bn) AND $\mathrm {C}_{n-1})$ OR (An AND Bn), the sum Sn and the carry out Cn Can be computed using An XOR Bn and An AND Bn, in addition to $\mathrm {C}_{n-1}($ carry input from the previous stage). We can see that ADD operation in terms of the outputs of bitwise operations, AND and XOR. Three additional logic gates are required to enable this computation. In this mode, the amounts of energy consumption for add operation is reduced within one cell, which can compute BNN efficiently.Meanwhile, since $\mathrm {I}_{SL}$ is smaller, the full-add operation is more reliable which benefits from lower sensing disturbing probability.