A Reconfigurable Two‐WSe 2 ‐Transistor Synaptic Cell for Reinforcement Learning
Yue Zhou,Yasai Wang,Fuwei Zhuge,Jianmiao Guo,Sijie Ma,Jingli Wang,Zijian Tang,Yi Li,Xiangshui Miao,Yuhui He,Yang Chai
DOI: https://doi.org/10.1002/adma.202107754
IF: 29.4
2022-02-25
Advanced Materials
Abstract:Reward-modulated spike-timing-dependent plasticity (R-STDP) is a brain-inspired reinforcement learning (RL) rule, exhibiting potential for decision-making tasks and artificial general intelligence. However, the hardware implementation of the reward-modulation process in R-STDP usually requires complicated Si complementary metal-oxide-semiconductor (CMOS) circuit design that causes high power consumption and large footprint. Here, a design with two synaptic transistors (2T) connected in a parallel structure is experimentally demonstrated. The 2T unit based on WSe<sub>2</sub> ferroelectric transistors exhibits reconfigurable polarity behavior, where one channel can be tuned as n-type and the other as p-type due to nonvolatile ferroelectric polarization. In this way, opposite synaptic weight update behaviors with multilevel (>6 bit) conductance states, ultralow nonlinearity (0.56/-1.23), and large G<sub>max</sub> /G<sub>min</sub> ratio of 30 are realized. By applying positive/negative reward to (anti-)STDP component of 2T cell, R-STDP learning rules are realized for training the spiking neural network and demonstrated to solve the classical cart-pole problem, exhibiting a way for realizing low-power (32 pJ per forward process) and highly area-efficient (100 µm<sup>2</sup> ) hardware chip for reinforcement learning.
materials science, multidisciplinary,chemistry, physical,physics, applied, condensed matter,nanoscience & nanotechnology