WO2018228295A1 - 一种支撑基于 rram 的神经网路训练的外围电路及系统 - Google Patents

一种支撑基于 rram 的神经网路训练的外围电路及系统 Download PDF

Info

Publication number
WO2018228295A1
WO2018228295A1 PCT/CN2018/090541 CN2018090541W WO2018228295A1 WO 2018228295 A1 WO2018228295 A1 WO 2018228295A1 CN 2018090541 W CN2018090541 W CN 2018090541W WO 2018228295 A1 WO2018228295 A1 WO 2018228295A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
circuit
calculation
neural network
rram
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2018/090541
Other languages
English (en)
French (fr)
Inventor
刘武龙
姚骏
汪玉
成铭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to EP18817002.1A priority Critical patent/EP3564867A4/en
Publication of WO2018228295A1 publication Critical patent/WO2018228295A1/zh
Priority to US16/545,932 priority patent/US11409438B2/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/10Program control for peripheral devices
    • G06F13/12Program control for peripheral devices using hardware independent of the central processor, e.g. channel or peripheral processor
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • G06N3/065Analogue means
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning

Definitions

  • the present invention relates to the field of neural network training, and more particularly to peripheral circuits for supporting RRAM based neural network training. Background technique
  • neural networks especially deep neural networks
  • Deep neural network training calculations have two typical characteristics: dense memory access and computationally intensive.
  • massive training data such as imag ene t 2012 contains 14 million pictures
  • deep neural network contains connection parameters of hundreds of millions of neurons, especially in training Frequent updates are required in the process
  • deep neural networks generate a large number of intermediate results, such as gradient information, during the operation.
  • the memory access overhead of training data, connection weights, intermediate results, etc. is an urgent requirement for data storage structure and computational performance optimization.
  • the typical operation of deep neural networks is multidimensional matrix multiplication (operational complexity is 0 (N 3 )) and graph optimization; for example, the 22-layer googlenet network requires 6GFL0PS of computation, so the computational hardware and performance Optimization puts higher requirements.
  • Resistive Random Access Memory is considered to be one of the devices that improve the energy efficiency of neural networks.
  • RRAM has a very high integration density.
  • RRAM is a non-volatile memory that can realize the fusion of storage and calculation, thus greatly reducing the memory access overhead.
  • the RRAM memory cell utilizes variable resistance.
  • Features can characterize multiple values, rather than the 0 and 1 binary values of traditional memory cells.
  • a cross-array structure is constructed by RRAM, as shown in Fig. 1, which can be well adapted to the matrix vector multiplication of the neural network itself.
  • the existing work shows that the cross-array structure built by the RRAM storage unit accelerates the calculation of the neural network, and can improve the energy efficiency by 100-1000 times compared with the CPU or the GPU.
  • the existing work has not fully exploited the advantages of RRAM.
  • the RRAM cross array in the prior art can only be used to accelerate the forward calculation of neural networks, and cannot be used to accelerate the computationally demanding neural network training process.
  • the neural network training mainly includes three steps of forward calculation, back propagation and weight update.
  • the prior art lacks peripheral circuits supporting the corresponding calculations.
  • a peripheral circuit is needed to support RMM-based neural network training to improve the energy efficiency of neural network training.
  • the present application provides a peripheral circuit for supporting neural network training based on RRAM cross array, which aims to support neural network training based on RRAM cross array by using the peripheral circuit, and accelerate the neural network. Calculation.
  • the present application provides a peripheral circuit for supporting neural network training based on a resistive memory RRAM cross array, characterized by comprising: a data preparation circuit for inputting the first of the data preparation circuits Data is subjected to a first pre-processing, and the pre-processed data obtained by the first pre-processing is selectively introduced into a row or a column of the RRAM cross-array according to the first control signal, where the first data includes Generating neural network training sample data; data selection circuit for selectively deriving from a row or column of the RRAM cross array according to a second control signal Second data, and performing second preprocessing on the second data to obtain third data; wherein the second data is data obtained by calculating the preprocessed data in the RRAM cross array; Corresponding relationship between the first control signal and the second control signal, the correspondence relationship is used to indicate that: the data preparation circuit performs the first pre-processing on the first data according to the first control signal, and then imports the And corresponding to the data selection circuit
  • the peripheral circuit further includes: a storage medium, configured to store the sample data and at least one of the data preparation circuit, the data selection circuit, the data read circuit, and the reverse training calculation circuit Incoming data.
  • the storage medium includes: a buffer registration unit and a sample data storage unit; the sample data storage unit is configured to store the sample data; the buffer registration unit is configured to store the data preparation circuit, the data Selecting at least one of the data stored in the circuit, the data read circuit, and the reverse training calculation circuit.
  • the data preparation circuit, the data selection circuit, the data read circuit and the reverse training calculation circuit both read and write data with the buffer register unit through a high data bandwidth.
  • the buffer register unit and each circuit of the peripheral circuit exchange data through high data bandwidth (high bandwidth), wherein the high bandwidth is understood in the general sense of the technical field, and may also change with the development of the technology, and is not used herein.
  • the various data generated during the neural network training process can be temporarily stored in the buffer register unit, which can improve the reading and writing efficiency of the calculated data.
  • the buffer registration unit may be a storage unit independent of the storage circuit.
  • the peripheral circuit can support the calculation of the three steps of forward calculation, back propagation, and weight update of neural network training based on RRAM cross array, and accelerate neural network training.
  • the data preparation circuit includes: a word line driver and decoder WDD, two first transmission gates TG; the WDD is configured to receive the first data, and the Performing the first pre-processing to obtain a pre-processing; the two first TGs are connected in parallel, and are connected to an output port of the WDD; wherein the two first TGs include a first row TG and a first Column TG, and the first row TG and the first column TG are not turned on at the same time; the first row TG is used to turn on the WDD to connect the RRAM cross array according to the first control signal a path of each row, and directing pre-processed data of the WDD output to each row of the RRAM cross-array; the first column TG is configured to turn on the columns of the RRAM cross-array according to the control signal to turn on the WDD And the pre-processed data of the WDD output is imported into each column of the RRAM cross-array.
  • Two transmission gates are arranged between the WDD and the RRAM cross array, and the control signals are used to control the conduction and closure of different transmission gates according to specific calculation requirements, selectively opening the path between the WDD and RRAM cross array rows or turning the WDD
  • the path between the columns of the RRAM cross array is opened, and the corresponding data interaction is performed after the path is opened; the design of the transmission gate can be multiplexed into the WDD and the data preparation circuit without increasing the complexity of the circuit.
  • Other units enable data to be imported into rows or columns in the RRAM cross array.
  • an alternative one of the two types of transmission switches is provided between the WDD and the RRAM cross array to replace the two transmission gates, or the like. Achieve the same purpose circuit.
  • the data selection circuit includes: a pre-processing circuit, a multiplexer, and two second TGs; the two second TGs are connected in parallel, and the multi-path selection The input port of the device is connected; wherein, the second TG includes a second row TG and a second column TG, and the second row TG and the second column TG are not turned on at the same time; the second row TG a path for connecting the multiplexer to each row of the RRAM cross array according to the second control signal; the second column TG is configured to turn on the multiplex according to the second control signal Connecting the paths of the columns of the RRAM cross array; the multiplexer is operative to derive the second data from the RRAM cross array through the conductive paths in the two second TGs; The circuit is configured to perform the second pre-processing on the second data derived by the multiplexer to obtain the third data, and store the third data in the buffer registration unit.
  • Two transmission gates are arranged between the multiplexer and the RRAM cross array, and the conduction and control of different transmission gates are controlled by control signals according to specific calculation requirements, and the rows of the multiplexer and the RRAM cross array are selectively selected.
  • the path is opened or the path between the multiplexer and the columns of the RRAM cross array is opened, and the corresponding data interaction is performed after the path is opened; the design of the transmission gate can be performed without increasing the complexity of the circuit.
  • the results of the RRAM cross-array calculations can be derived from rows or columns in the RRAM cross-array by multiplexing the multiplexer, pre-processing circuitry, and other elements of the data-reading circuitry.
  • an alternative one of the pass switches is provided between the multiplexer and the RRAM cross array to replace the two pass gates described above, or other similar circuits that achieve the same purpose.
  • the data reading circuit includes: a read amplifying circuit and a maximum pooling operation circuit; the read amplifying circuit is configured to select from the buffer register unit or the data Reading the fourth data in the circuit; the maximum pooling operation circuit is configured to perform the maximum pooling operation according to the fourth data read by the read amplifying circuit to obtain the fifth data, and the The fifth data is stored in the buffer register unit, wherein the fifth data is a forward calculated value; wherein the maximum pooling operation circuit includes at least one first register; each of the at least one first register The input ports of the first registers are connected to a first selector, and the first selector is configured to selectively read 0 according to the third control signal or read an operand to be the maximum pooling operation from the buffer register unit, And inputting the operand to the corresponding first register.
  • An optional selector is connected to the input port of each register, and different selectors can be selectively input to the register by controlling the selector of the second selection; when a certain selector selects 0, it means corresponding
  • the register does not participate in the maximum pooling operation, which is equivalent to the maximum pooled operation circuit.
  • the register with the input 0 is not present.
  • the number of control registers to satisfy the maximum number of bits of operation data is satisfied by selecting the number of selectors.
  • the maximum pooling operation circuit further includes four second registers, where the second register is used to indicate that the register is not connected to the selector, and the second register is used to read the maximum pooling from the buffer register unit.
  • the operand of the operation In general, the operand that needs to be calculated for the maximum pooling operation is 4 bits, that is, in general, at least 4 registers are used to read the data to be maximized, which can reduce unnecessary selection. Reduce the cost.
  • the reverse training calculation circuit includes: an error calculation circuit and a derivative calculation circuit; and the error calculation circuit is configured to read the circuit or the buffer according to the data Reading the sixth data calculation error in the registration unit, and storing the calculated error in the buffer registration unit; the derivation calculation circuit is configured to perform storage from the data read circuit or the buffer Reading the sixth data in the unit to calculate nonlinearity a derivative of the function, and storing the calculated derivative into the buffer register unit; wherein the sixth data includes a forward calculated value.
  • the reverse training calculation circuit can support two important calculations of derivation and error based on neural network training based on RRAM cross array.
  • the nonlinear function includes: a ReLU function, a sigmoid function.
  • the derivative calculation circuit includes: a second selector, a third selector, a comparator, a first subtractor, and a multiplier; an output port of the second selector Connected to an input port of the first subtractor; an output port of the first subtractor is coupled to an input port of the multiplier; an output port of the multiplier and a first input port of the third selector Connecting; the output port of the comparator is connected to the second input port of the third selector; wherein the second selector is configured to read forward from the data reading circuit or the buffer register unit a calculated value; the first subtractor is configured to subtract the forward calculated value of the second selector input from 1 to obtain a first difference; the multiplier is configured to input the first a first product of which the difference is multiplied; the comparator is configured to compare the forward calculated value input by the data reading circuit to obtain a comparison result; and the third selector is configured to use the fourth control signal according to the fourth control signal The comparison result is selectively selected from the comparator
  • the typical nonlinear functions used in neural network training mainly include ReLU (Rectified Linear Units) function and Sigmoid function.
  • the result of the comparison operation by the comparator is the result of the derivation calculation for the ReLU function, after subtraction and multiplication.
  • the result of the corresponding operation by the multiplier is the derivative calculation for the Sigmoid function.
  • the error calculation circuit includes: a fourth selector, an adder, and a second subtractor; an output port of the fourth selector is connected to an input port of the adder The output port of the adder is connected to the input port of the second subtractor; wherein the fourth selector is configured to selectively read 0 or read the bias r according to the fifth control signal, and the read 0 or r inputting the adder; the adder is configured to read the seventh data from the data reading circuit or the buffer register unit, and input the data input by the fourth selector and the read The seventh data is added to obtain a first sum value; the second subtractor is configured to read the eighth data from the data reading circuit or the buffer register unit, and input the adder The first sum value is subtracted from the eighth data to obtain an error.
  • Selecting different data according to different control signals by the fourth selector can satisfy the error calculation of the two kinds of neural network training based on the RRAM cross array-based supervised learning neural network training and the enhanced learning neural network training; optionally, the error calculation circuit It can also be applied to the derivation calculation of other non-RRAM cross array-based neural networks.
  • the neural network training is supervised learning neural network training;
  • the fourth selector is specifically configured to read 0 according to the fifth control signal, and input the read 0 into the adder;
  • the adder is specifically configured to read the true value y* corresponding to the sample data from the buffer registration unit, and add the true value y* corresponding to the sample data to the 0 input by the fourth selector.
  • the second subtractor is specifically configured to read a forward calculated value from the data reading circuit or the buffer register unit as f (x) And subtracting the forward calculated value f(x) from the first sum value input by the adder to obtain an error.
  • the neural network training is deep reinforcement learning neural network training; the fourth selector is specifically configured to read r according to the fifth control signal, and input the r into the adder; the adder Specifically for the data
  • the forward calculated value maX is read in the read circuit or the data buffer register unit. 2 » ; ⁇ ), and the forward calculated value maX .
  • the data reading circuit further includes a weight update control circuit, where the weight update control circuit is configured to determine whether the weight value is a positive value or a negative value, and respectively according to the determination result Outputting a first RESET signal and a second RESET signal; wherein the weight value is represented by a difference between the first weight value W+ and the second weight value W-, the first weight value W+ and the second weight value W - all positive values, the first RESET signal is used to indicate that the weight value is a positive value, the second RESET signal is used to indicate that the weight value is a negative value, and the first RESET signal is used to control storage a corresponding node of the RRAM cross array of the second weight value W- performs a RESET operation, and the second RESET signal is used to control a corresponding node of the RRAM cross array storing the first weight value W+ to perform the RESET operation, the RESET operation is used to indicate adjustment from a low resistance value to a high resistance value.
  • the weight update control circuit is configured to determine whether the weight
  • Neural network training has access-intensive features.
  • RRAM When RRAM is used for neural network training, frequent write operations can greatly reduce the reliability of RRAM, especially for RRAM SET operation, that is, from high resistance to low resistance.
  • the value update control circuit determines whether the weight value is a positive value or a negative value, and controls the result as a control signal to connect the switch of the RRAM cross array storing the W- or W+, and performs a RESET operation on the RRAM cross array to avoid SET operation on the RRAM, thereby improving RRAM is used for the reliability of neural network training.
  • the present application provides a peripheral circuit for supporting neural network training based on RRAM cross array, which aims to support neural network training based on RRAM cross array by the peripheral circuit, and accelerate neural network calculation.
  • the present application provides a neural network training system, the system comprising: a control circuit, a resistive memory RRAM cross array, and a peripheral circuit described in any one of the foregoing possible implementations of the first aspect;
  • the control circuit is configured to generate a plurality of control signals, the plurality of control signals including at least: a first control signal, a second control signal, a third control signal, a fourth control signal, and a fifth control signal.
  • the neural network system provided by the present application can accelerate the training of the neural network compared to the central processor (Central)
  • Central central processor
  • FIG. 1 is a schematic diagram of an RRAM cross array structure provided by the present application.
  • FIG. 2 is a schematic structural diagram of a neural network training system according to an embodiment of the present application
  • FIG. 3 is a schematic diagram of a data preparation circuit according to an embodiment of the present application
  • FIG. 4 is a schematic diagram of a data selection circuit according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a data reading circuit according to an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a maximum pooling operation circuit provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a maximum pooling operation circuit provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a weight update control circuit provided by an embodiment of the present application.
  • 9 is a schematic diagram of a derivative calculation circuit provided by an embodiment of the present application.
  • 10 is a schematic diagram of an error calculation circuit provided by an embodiment of the present application;
  • FIG. 11 is a schematic diagram showing the flow of data of a forward calculation provided by an embodiment of the present application.
  • FIG. 12 is a schematic diagram of a data inflow and outflow RRAM cross array provided by an embodiment of the present application
  • FIG. 13 is a schematic diagram of a backpropagation data flow provided by an embodiment of the present application
  • FIG. 14 is a schematic diagram of a data inflow and outflow RRAM cross array provided by an embodiment of the present application.
  • FIG. 15 is a schematic diagram showing a flow of data of a weighted value provided by an embodiment of the present application;
  • Figure 16 is a schematic illustration of a medium deep reinforcement learning neural network training provided by an embodiment of the present application.
  • a peripheral circuit provided by the present application can be applied to a neural network training based on a Resistive Random Access Memory (RRAM) crossbar array (Rrossbar), and can of course be applied to neural network training based on other devices similar to RRAM performance.
  • RRAM Resistive Random Access Memory
  • the calculation of multiple steps, neural network training is not limited to supervised learning neural network training, deep reinforcement learning neural network training, but also can be applied to other neural network training or emerging neural network training.
  • FIG. 2 it is a schematic diagram of a system architecture in a possible application scenario of the present application.
  • the storage medium communicates with a function circuit (Full function Subarray, FF Subarray) to implement a function circuit to store data in or read data from the storage medium.
  • the storage medium may be any form of non-volatile storage medium, which is not limited herein.
  • the buffer register unit may be a part of the storage medium or a separate storage medium. When the buffer register unit is a separate storage medium, it may be any form of non-volatile or volatile storage medium.
  • a lossy storage medium is a storage medium that can temporarily store data and lose data after power is turned off.
  • the buffer register unit and the functional circuit of the peripheral circuit can perform data interaction through high data bandwidth (high bandwidth), wherein the high bandwidth is understood in the general sense of the technical field, and may also change with the development of the technology, and is not specifically described herein. Qualified, so some intermediate data generated during the neural network training process can be temporarily stored in the buffer register unit, which can improve the reading and writing efficiency of the calculated data.
  • the division of the functional circuit is to more conveniently describe the system architecture in the application scenario.
  • the functional circuits mainly include: a data preparation circuit, a data selection circuit, a data reading circuit, and a reverse training calculation circuit. Communication between circuits in the functional circuit can be performed for data transmission, and each circuit in the functional circuit can communicate with a storage medium or a buffer register unit for reading or writing data.
  • the data preparation circuit is in communication with the RRAM cross array for importing data into the RRAM cross array; the data selection circuit is in communication with the RRAM cross array for deriving data from the RRAM cross array.
  • RRAM has variable resistance characteristics and can represent multiple values. Unlike traditional storage media, which can only express 0 and 1 binary values, RRAM has a cross-array structure, which is very suitable for matrix vector multiplication in neural networks.
  • the configuration of the RRAM cross array can be specifically arranged according to the structure of the neural network, and the specific operations of the RRAM cross array and the neurons, and the present application does not impose any limitation.
  • the control circuit controller is used to control a series of operations of the various circuits of the functional circuit by the control signal.
  • control circuit is also used for operation of the RRAM array. Specifically, for example, the control circuit sends a control signal to the data preparation circuit. After receiving the control signal, the data preparation circuit turns on a switch corresponding to the control signal to make the data circuit and The paths of the rows of the RRAM cross array are opened, and the data preparation circuit is implemented to import data into each row of the RRAM cross column.
  • the data preparation circuit, the data selection circuit, the data reading circuit, and the reverse training calculation circuit described in the above embodiments are generally in the form of a circuit structure or a hardware structure that can implement the corresponding functions described in the above embodiments, and control
  • the circuit may be any controller capable of generating a control signal, or may be a central processing unit, an image processor, etc., and the above data preparation circuit, data selection circuit, data reading circuit, reverse training calculation circuit, and control circuit It can be mounted on one substrate or on multiple substrates, and it is worth noting that it can also be fabricated as an integrated circuit.
  • the system architecture described in the foregoing embodiments may constitute a device specially used for neural network training; the system architecture may also be assembled in a device, such as a computer, a server, a terminal; in the case where the technology is achievable, The entire system architecture is highly integrated on the silicon chip; it should be noted that the circuits in the system architecture can be placed in different devices without affecting their functions, and communicated by wireless or wired, for example, The storage medium and the buffer register unit, the function circuit, and the RRAM cross array are respectively placed on two devices. In summary, the system architecture is essentially unchanged, and any form of presentation falls within the scope of this application.
  • the system is used to support RRAM-based neural network training, including forward calculation, backpropagation, and weight update, compared to a Central Processing Unit (CPU) or a Graphics Processing Unit (GPU). 5 ⁇ The calculation of the neural network training, increased by at least 5.7 times.
  • CPU Central Processing Unit
  • GPU Graphics Processing Unit
  • the "data” described in this application is usually transmitted in the form of an electrical signal, such as a current signal, a voltage signal. No.
  • the electrical signal can be used to indicate sample data related to neural network training, pre-processing results, calculation results of each link, and the like.
  • the buffer referred to in the drawings of the present application generally corresponds to the buffer storage unit in the embodiment corresponding to Fig. 2.
  • One embodiment of the present application describes a peripheral circuit structure for supporting RRAM-based neural network training, the peripheral structure including: a data preparation circuit, a data selection circuit, a data read circuit, a reverse training calculation circuit;
  • the peripheral circuit further includes a memory, and the memory can divide the buffer register unit.
  • the data preparation circuit is configured to selectively preprocess the data in the input data preparation circuit according to the control signal and then import it into the row or column of the RRAM cross array.
  • the foregoing control signal may be generated by the control unit according to a current calculation process, where the control signals corresponding to the forward calculation and the back propagation are different, and the data preparation circuit receives different control signals to perform corresponding operations, for example, If the current calculation is a neural network forward calculation, the control unit sends a control signal to the data preparation circuit, so that the data preparation circuit opens the path connecting the rows of the RRAM cross array, and imports the preprocessed result into the RRAM cross array.
  • the control unit sends a control signal to the data preparation circuit, so that the data preparation circuit opens the paths connecting the columns of the RRAM cross array, and imports the preprocessed result into the RRAM. Cross the columns in each column. It should be noted that the paths connecting the rows or columns of the RRAM cross array can only be opened at the same time.
  • the data preparation circuit is divided into two components: a Transfer Gate (TG) and a Word Line Driver and Decoder (WDD), and circuits other than the two TGs in FIG.
  • the structure constitutes the word line driver and decoder WDD.
  • WDD is used to preprocess the data in the input data preparation circuit, and the preprocessed result is transmitted to one of the two TGs to the RRAM cross array. Controlling whether the two TGs in the data preparation circuit are turned on or off by a control signal, wherein one TG is turned on, so that the data preparation circuit opens the path connecting the rows of the RRAM cross array, corresponding to the current calculation as a neural network.
  • Another TG conduction can cause the data preparation circuit to open the path connecting the columns of the RRAM cross array, corresponding to the current calculation described above as neural network back propagation.
  • the two TGs are not turned on at the same time, specifically turning on or off according to the control signal generated by the current calculation process.
  • replace one of the two TGos with an alternative switch device
  • a WDD circuit is multiplexed, and data is selectively introduced into each row or column of the RRAM cross array according to the current calculation, that is, the forward direction of the neural network can be supported. Computation and backpropagation, such a circuit structure is simpler and less expensive.
  • the data selection circuit is configured to selectively derive data from each row of the RRAM cross array or each column of the RRAM cross array according to the control signal, and perform corresponding nonlinear calculation and subtraction operations on the derived data, and output the processed data to The next circuit unit, such as a data selection circuit. Similar to the data preparation circuit, the control signal received by the data selection circuit is generated by the control unit according to the current calculation process, and will not be described again.
  • the data selection circuit can also be divided into two components: two TG and a multi-path selector (Column MUX) and a pre-processing circuit.
  • the multiplexer is used for data derived from one of the two TGs in the RRAM cross array
  • the pre-processing circuit is used for preprocessing such as nonlinear function calculation and subtraction operation.
  • the data selection circuit derives data from the rows or columns in the RRAM cross array when one of the two TGs is turned on. How to control the TG can be seen in detail. The corresponding description of the data preparation circuit will not be repeated here.
  • the data reading circuit includes a read amplifying circuit, a maximum pooling operation circuit, and a weight update control circuit.
  • the data reading circuit is used for the weight update control operation and the maximum pooling operation on the data input by the data selection circuit, and the data obtained after the maximum pooling operation is stored in the buffer register unit.
  • the circuit structure except the maximum pooling operation circuit and the weight update control circuit in FIG. 5 constitutes a read amplification circuit; the read amplification circuit is used for reading data from a buffer registration unit or a data selection circuit, and processing and transmitting to Maximum pooling operation circuit.
  • Figure 6 shows the maximum pooling operation circuit in the data reading circuit.
  • the maximum pooling operation in the neural network forward calculation is generally the maximum pooling of the 4-bit operand, and the nerve represented by deep reinforcement learning. In the network training or calculation process, the maximum pooling operation with operands greater than 4 is also involved.
  • the maximum pooling operation circuit in Figure 6 can not only support the maximum pooling operation of 4-bit operands, but also support greater than The maximum pooling operation of 4 multi-bit operands can be flexibly adjusted according to specific needs.
  • the maximum pooling operation circuit includes m registers (Regi ster, Reg), that is, Reg 1, Reg 2'" Reg m; wherein, in addition to Reg 1, Reg 2 , Reg 3, Reg 4, Reg 5_ Reg m Each register in the register is also connected to an alternative multiplexer.
  • the number n is controlled by the binary selector corresponding to each register unit in Reg 5-Reg n to read the n-bit operand of the maximum pooling operation from the buffer register unit, Reg ( n+1 ) -Reg m One for each register unit 0 read multiplexer in accordance with the control signal.
  • Regl-m in the maximum pooling operation circuit shown in FIG. 6 is connected to an alternative multiplex, which can more flexibly adjust the number of registers that normally read data from the buffer register unit.
  • the maximum pooling operation is implemented to achieve a bit number less than 4 operands.
  • the maximum pooling operation involving multi-bit operands can be supported by adding registers to the maximum pooled operation circuit that are connected to the alternative multiplexer.
  • each storage node of the RRAM cross array cannot store negative values.
  • each weight w is decomposed into w+ and w- parts and stored in the nodes corresponding to the two sets of RRAM cross arrays, where w+ and w- are both Positive value.
  • the inventive idea of the present application is:
  • the weight of the neural network reverse training is adjusted to a negative value, it means Need to reduce Weight, where the weight corresponds to the reciprocal of the RRAM resistance, that is, conductance, that is, increase and decrease the conductance value of the corresponding RRAM storage node.
  • the RESET operation only the RESET operation is performed on the corresponding node of the RRAM cross array storing w+, RESET operation It means that the low resistance value is adjusted to a high resistance value, and the conductance value is from high to low.
  • the weight obtained during the reverse training of the neural network is adjusted to a positive value, it means that the weight needs to be increased, that is, the conductance value of the corresponding RRAM storage node is increased, and only the corresponding node of the RRAM cross array storing the w- is RESET. Operation, the conductance value is from low to high.
  • the data reading circuit further includes a weight update control circuit, as shown in FIG.
  • the weight update control circuit is configured to control the update of the weight value or parameter during the network training process.
  • the weight update control circuit includes an exclusive OR discriminating sub-circuit for determining whether the weight in the input weight update control circuit is a positive value or a negative value, and generating a first control signal or a second by the control unit according to the determination result.
  • a control signal the first control signal is used to indicate that the weight value is a positive value
  • the second control signal is used to indicate that the weight value is a negative value
  • the first control signal controls a corresponding node of the RRAM cross array storing the W- to perform a RESET operation
  • second The control signal is used to control the corresponding node of the RRAM cross array of the W+ to perform a RESET operation.
  • the weight update control circuit optimizes the way of weight update, reduces the SET operation, and improves the reliability of RRAM write.
  • the reverse training calculation circuit includes a derivation calculation circuit and an error calculation circuit.
  • Figure 9 shows a derivative calculation circuit as described in one embodiment of the present application.
  • the derivative calculation circuit includes two two-way selectors, comparators, subtractors, and multipliers, which can perform derivation calculations for typical nonlinear functions involved in neural network training, such as ReLU (Rectified Linear Units) functions, Sigmoid functions.
  • ReLU Rectified Linear Units
  • FIG. 9 A specific structure of the derivation calculation circuit is shown in FIG. 9, in which an output port of a two-way selector (located in the upper right area in FIG. 9) is connected to an input port of the subtractor; an output port of the subtractor and a multiplier The input port is connected; the output port of the multiplier is connected to one input port of another two-way selector (located in the lower area in Figure 9); the output port of the comparator is connected to the other input port of the two-way selector.
  • the derivative calculation circuit compares the data output from the data read circuit into the comparator (Compparer, Comp) of the derivative calculation circuit, and outputs the result as a derivative of the ReLU function.
  • the derivative calculation of the Sigmoid function the derivative calculation circuit reads the forward-calculated value f (x) from the data read circuit or the buffer register unit, and inputs f (X) into the subtractor of the derivative calculation circuit and 1 The difference is performed, and the result of the difference is input to the multiplier of the derivative calculation circuit for multiplication, and the output of the multiplier is the result of the calculation of the Sigmoid function.
  • the derivative calculation circuit selectively selects a calculation result output from its comparator or multiplier according to a control signal via a two-way selector connected to the output of the comparator and the output of the multiplier.
  • the control signal generates different control signals by the control circuit according to different functions of the current derivative calculation; for example, for the ReLU function to perform the derivative calculation, the control signal controls the two-way selection connected to the comparator of the derivative calculation circuit.
  • the result of the comparator output is selected as the output result of the derivative calculation circuit to implement the derivative calculation for the ReLU function; or, for the Sigmoid function to perform the derivative calculation, the control signal is controlled to be connected to the multiplier of the derivative calculation circuit.
  • the two-way selector selects the output of the multiplier as the output result of the derivative calculation circuit, and implements the derivative calculation for the Sigmoid function. Further, the derivative calculation circuit stores the output result in the buffer registration unit/storage medium.
  • the derivative calculation circuit can support the derivative calculation in the RRAM-based neural network training, and improve the efficiency of the neural network training derivation calculation.
  • Figure 10 shows an error calculation circuit as described in one embodiment of the present application.
  • the error calculation circuit includes a two-way selector, an adder, and a subtractor, which can be used for performing error calculation involved in the neural network training process, and the neural network training is typically supervised learning training and deep intensity learning training.
  • FIG. 10 A specific structure of the error calculation circuit is shown in FIG. 10, wherein an output port of the two-way selector in the error calculation circuit is connected to an input port of the adder; an output port of the adder and a subtractor in the error calculation circuit Input port connection; two-way selector is used to selectively read 0 or read r according to the received control signal, where r is biased, stored in the memory, and can be dynamically refreshed according to the number of trainings; the adder is used to selectively select according to the control signal Read y* or max from the data read circuit output from the buffer register unit. 2 » ; ⁇ ) value; and the input data is added; the subtractor is used to selectively read the forward calculated calculated value f (x) or G "U from the buffer register unit according to the control signal, and input The data is subtracted.
  • the above control signal generates a corresponding control signal according to the difference of the neural network training, so as to control the specific data read by the two-way selector, the adder and the subtractor in the error calculation circuit.
  • the following is an example of supervised learning training and deep intensive learning training to describe the error calculation circuit in detail:
  • the two-way selector in the control signal control error calculation circuit selects 0 and inputs the adder connected thereto, and the error calculation circuit reads the sample data of the neural network training from the buffer registration unit.
  • the true value y* is input to the adder, the adder adds y* to 0 and inputs the added result to the subtractor connected thereto, and the error calculation circuit reads the neural network forward calculated value f from the buffer register unit (x And inputting the subtractor, completing the operation of f (x) -y*, obtaining an error calculation result for the supervised learning neural network training; further, transmitting the obtained error calculation result to the data preparation circuit for performing phase training calculation and error propagation.
  • the two-way selector in the control signal control error calculation circuit selects r and inputs the adder connected thereto, and the error calculation circuit reads from the buffer registration unit or the data read circuit.
  • the forward calculated value max . 2 ' a; ⁇ ) and input the adder, jra will add max '( ⁇ ; ⁇ to r and add the result of the addition to the subtractor connected to it, the error calculation circuit reads from the buffer register and inputs Subtractor, the subtractor compares the addition result of the adder input with G ( ⁇ ;; further, transmits the obtained error calculation result to the data preparation circuit for the phase training calculation and the error propagation.
  • the error calculation circuit can support the error calculation in RRAM-based neural network training, and improve the efficiency of error calculation in neural network training.
  • An embodiment of the present application in the form of supervised learning, describes in detail the peripheral circuit branch described in the embodiment of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Neurology (AREA)
  • Memory System (AREA)
  • Logic Circuits (AREA)
  • Semiconductor Memories (AREA)

Abstract

本发明实施例提供了一种外围电路,该外围电路包括:数据准备电路,用于根据第一控制信号选择地将输入所述数据准备电路的第一数据进行第一预处理后导入所述 RRAM 交叉阵列的行或列中,所述第一数据包括用于所述神经网络训练的样本数据;数据选择电路,用于根据第二控制信号选择地从所述RRAM 交叉阵列的行或列中导出第二数据,且对所述第二数据进行第二预处理得到第三数据;数据读取电路,用于权值更新控制操作和对输入所述数据读取电路的第四数据进行最大池化操作得到第五数据;反向训练计算电路,用于将输入所述反向训练计算电路中的第六数据进行误差计算和求导计算。本发明技术方案能够支撑基于 RRAM 的神经网络训练,加速神经网络训练和提升RRAM 写的可靠性。

Description

一种支撑基于 RRAM的神经网路训练的外围电路及系统
技术领域
本发明涉及神经网络训练领域, 尤其涉及用于支持基于 RRAM的神经网路训练的外围 电路。 背景技术
近些年, 神经网络尤其是深度神经网络在计算视觉、 语音识别、 智能控制等领域得到 了广泛应用。 深度神经网络训练计算具备两个典型特点: 访存密集和计算密集。 对于访存 密集的特性, 首先, 神经网络训练往往需要依赖海量的训练数据, 如 imagenet 2012包含 1400 万幅图片; 其次, 深度神经网络包含上亿级的神经元的连接参数, 尤其在训练过程 中需要频繁更新;再者,深度神经网络在运算过程中会产生大量的中间结果,如梯度信息。 训练数据、 连接权重、 中间结果等大量数据的访存开销对于数据存储结构和计算性能优化 提出迫切要求。 对于计算密集的特性, 深度神经网络的典型运算为多维矩阵乘法(运算复 杂度为 0 (N3) ) 及图优化; 例如, 22层的 googlenet网络需要 6GFL0PS的计算量, 因此对 计算硬件和性能优化提出了较高要求。
阻变存储器 ( Resistive Random Access Memory, RRAM) 被认为是提升神经网络计算 能效的器件之一。 首先, RRAM具备非常高的集成密度; 其次, RRAM是一种非易失性的存 储器, 可实现存储与计算的融合, 从而大大减小访存开销; 再者, RRAM存储单元利用阻 值可变特性能够表征多值, 而非传统存储单元的 0和 1二值。 基于 RRAM以上这些特性, 通过 RRAM组建一种交叉阵列结构, 如图 1所示, 能够非常适应神经网络本身的矩阵向量 乘运算。 现有工作表明, 利用 RRAM存储单元搭建的交叉阵列结构对神经网络计算进行加 速, 同 CPU或者 GPU相比可提升 100-1000倍的能效。
然而, 现有工作还未充分地挖掘 RRAM的优势, 现有技术中的 RRAM交叉阵列目前只 能用于加速神经网络的前向计算, 不能用于对计算要求更高的神经网络训练过程进行加 速, 神经网络训练主要包括前向计算、 反向传播和权值更新三个步骤, 现有技术缺少支撑 相应计算的外围电路。
因此, 需要一种外围电路以支撑基于 RMM的神经网络训练, 提高神经网络训练的能 效。
发明内容
针对现有技术中存在的问题,本申请提供了一种用于支撑基于 RRAM交叉阵列的神经 网路训练的外围电路, 旨在通过该外围电路支撑基于 RRAM交叉阵列的神经网络训练, 加 速神经网络计算。
第一方面, 本申请提供了一种外围电路, 用于支撑基于阻变存储器 RRAM交叉阵列的 神经网络训练, 其特征在于, 包括: 数据准备电路,用于将输入所述数据准备电路的第一 数据进行第一预处理,且根据第一控制信号选择地将经所述第一预处理后得到的预处理数 据导入所述 RRAM交叉阵列的行或列中, 所述第一数据包括用于所述神经网络训练的样本 数据;数据选择电路,用于根据第二控制信号选择地从所述 RRAM交叉阵列的行或列中导出 第二数据, 且对所述第二数据进行第二预处理得到第三数据; 其中, 所述第二数据为所述 预处理数据在所述 RRAM交叉阵列中经计算后得到的数据; 所述第一控制信号与所述第二 控制信号存在对应关系, 所述对应关系用于指示: 所述数据准备电路根据所述第一控制信 号将所述第一数据进行第一预处理后导入所述 RRAM交叉阵列的行中对应于所述数据选择 电路根据所述第二控制信号从所述 RRAM交叉阵列的列中导出所述第二数据, 和, 所述数 据准备电路根据所述第一控制信号将所述第一数据进行第一预处理后导入所述 RRAM交叉 阵列的列中对应于所述数据选择电路根据所述第二控制信号从所述 RRAM交叉阵列的行中 导出所述第二数据; 数据读取电路,用于权值更新控制操作和根据输入所述数据读取电路 的第四数据进行最大池化操作得到第五数据, 所述第四数据包括所述第三数据; 反向训练 计算电路,用于根据输入所述反向训练计算电路中的第六数据进行误差计算和求导计算, 所述第六数据包括包括所述第五数据。
进一步地, 该外围电路还包括: 存储介质, 用于存储所述样本数据及所述数据准备电 路、 所述数据选择电路、 所述数据读取电路和所述反向训练计算电路中至少一个存入的数 据。 可选的, 所述存储介质包括: 缓冲寄存单元和样本数据存储单元; 所述样本数据存储 单元用于存储所述样本数据; 所述缓冲寄存单元用于存储所述数据准备电路、 所述数据选 择电路、 所述数据读取电路和所述反向训练计算电路中至少一个存入的数据。 其中,所述 数据准备电路、 所述数据选择电路、 所述数据读取电路和所述反向训练计算电路均通过高 数据带宽与所述缓冲寄存单元进行数据的读取和写入。 其中, 缓冲寄存单元与该外围电路 的各个电路通过高数据带宽 (高带宽)进行数据的交互, 其中, 高带宽为技术领域一般意 义上的理解, 也可能随着技术发展而变化, 在此不作具体限定, 因此神经网络训练过程中 产生的各种数据可以暂存于缓冲寄存单元, 这样能提高计算数据读写效率。 可选的, 缓冲 寄存单元可以是独立于存储电路的存储单元。
该外围电路能够支撑基于 RRAM交叉阵列的神经网络训练的前向计算、 反向传播、 权 值更新三个步骤的计算, 加速神经网络训练。
第一方面的一种可能的实现方式, 所述数据准备电路包括: 字线驱动和解码器 WDD、 两个第一传输门 TG; 所述 WDD用于接收所述第一数据, 且将所述第一数据进行所述第一 预处理得到预处理; 所述两个第一 TG并联, 且与所述 WDD的输出端口连接; 其中, 所述 两个第一 TG包括第一行 TG和第一列 TG, 且所述第一行 TG和所述第一列 TG不在同一时 刻导通;所述第一行 TG用于根据所述第一控制信号导通所述 WDD连接所述 RRAM交叉阵列 的各行的通路, 且将所述 WDD输出的预处理数据导入所述 RRAM交叉阵列的各行; 所述第 一列 TG用于根据所述控制信号导通所述 WDD连接所述 RRAM交叉阵列的各列的通路,且将 所述所述 WDD输出的预处理数据导入所述 RRAM交叉阵列的各列。
在 WDD与 RRAM交叉阵列之间设置两个传输门, 根据具体计算需求通过控制信号控制 不同的传输门的导通与闭合, 选择地将 WDD与 RRAM交叉阵列的各行之间的通路打开或者 将 WDD与 RRAM交叉阵列的各列之间的通路打开, 通路打开后进行相应的数据交互; 这种 传输门的设计, 可以在不增加电路复杂程度的情况下, 可以通过复用 WDD及数据准备电路 中其他单元能够实现将数据导入 RRAM交叉阵列中的行还是列中。 可选的, 在 WDD与 RRAM 交叉阵列之间设置一种二选一的导通开关电路来替代上述的两个传输门,或者其他类似能 实现同样目的电路。
第一方面的另一种可能的实现方式, 所述数据选择电路包括: 预处理电路、 多路选择 器和两个第二 TG; 所述两个第二 TG并联, 且与所述多路选择器的输入端口连接; 其中, 所述第二 TG包括第二行 TG和第二列 TG, 且所述第二行 TG和所述第二列 TG不在同一时 刻导通;所述第二行 TG用于根据所述第二控制信号导通所述多路选择器连接所述 RRAM交 叉阵列的各行的通路; 所述第二列 TG用于根据所述第二控制信号导通所述多路选择器连 接所述 RRAM交叉阵列的各列的通路;所述多路选择器用于通过所述两个第二 TG中导通的 通路从所述 RRAM交叉阵列导出所述第二数据; 所述预处理电路用于对所述多路选择器导 出的所述第二数据进行所述第二预处理得到所述第三数据,且将所述第三数据存入所述缓 冲寄存单元中。
在多路选择器与 RRAM交叉阵列之间设置两个传输门, 根据具体计算需求通过控制信 号控制不同的传输门的导通与闭合, 选择地将多路选择器与 RRAM交叉阵列的各行之间的 通路打开或者将多路选择器与 RRAM交叉阵列的各列之间的通路打开, 通路打开后进行相 应的数据交互; 这种传输门的设计, 可以在不增加电路复杂程度的情况下, 可以通过复用 多路选择器、 预处理电路及数据读取电路的其他单元能够实现从 RRAM交叉阵列中的行或 是列中导出经过 RRAM交叉阵列计算的结果。可选的, 在多路选择器与 RRAM交叉阵列之间 设置一种二选一的导通开关电路来替代上述的两个传输门,或者其他类似能实现同样目的 电路。
第一方面的另一种可能的实现方式, 所述数据读取电路包括: 读取放大电路、 最大池 化操作电路;所述读取放大电路用于从所述缓冲寄存单元或所述数据选择电路中读取所述 第四数据;所述最大池化操作电路用于根据所述读取放大电路读取的所述第四数据进行最 大池化操作得到所述第五数据, 且将所述第五数据存入所述缓冲寄存单元中, 所述第五数 据为前向计算的值; 其中, 所述最大池化操作电路包括至少一个第一寄存器; 所述至少一 个第一寄存器中的每个第一寄存器的输入端口连接一个第一选择器,所述第一选择器用于 根据第三控制信号选择地读取 0或从所述缓冲寄存单元中读取待最大池化操作的操作数, 且将所述操作数输入相应的所述第一寄存器。
在每一个寄存器的输入端口连接一个二选一的选择器,通过控制二选一的选择器能够 选择地向寄存器输入不同数据; 当某个二选一选择器读取 0时, 意味着相应的寄存器不参 与最大池化操作, 等同于最大池化操作电路不存在该输入为 0的寄存器; 通过二选一选择 器控制寄存器的个数, 满足不同位数的待最大池化操作数据的计算。 可选的, 所述最大池 化操作电路还包括四个第二寄存器, 第二寄存器用于指示没有连接选择器的寄存器, 第二 寄存器用于从所述缓冲寄存单元中读取待最大池化操作的操作数。 一般而言, 需要进行最 大池化操作的计算的操作数是 4位, 也就是说, 一般情况下至少有 4个寄存器用于读取待 最大池化操作的数据, 这样能够减少不必要的选择器, 降低相应成本。
第一方面的另一种可能的实现方式, 所述反向训练计算电路包括:误差计算电路和求 导计算电路; 所述误差计算电路, 用于根据从所述数据读取电路或所述缓冲寄存单元中读 取所述第六数据计算误差, 且将计算得到的误差存入所述缓冲寄存单元中; 所述求导计算 电路,用于根据从所述数据读取电路或所述缓冲寄存单元中读取所述第六数据计算非线性 函数的导数, 且将计算得到的导数存入所述缓冲寄存单元中; 其中, 所述第六数据包括前 向计算的值。 该反向训练计算电路能够支撑基于 RRAM交叉阵列的神经网络训练实现求导 与误差两个重要的计算。 可选的, 所述非线性函数包括: ReLU函数、 sigmoid函数。
第一方面的另一种可能的实现方式, 所述求导计算电路包括: 第二选择器、 第三选择 器、 比较器、 第一减法器和乘法器; 所述第二选择器的输出端口与所述第一减法器的输入 端口连接; 所述第一减法器的输出端口与所述乘法器的输入端口连接;所述乘法器的输出 端口与所述第三选择器的第一输入端口连接;所述比较器的输出端口与所述第三选择器的 第二输入端口连接; 其中, 所述第二选择器用于从所述数据读取电路或所述缓冲寄存单元 中读取前向计算的值;所述第一减法器用于将所述第二选择器输入的所述前向计算的值与 1进行相减得到第一差值; 所述乘法器用于将输入的所述第一差值进行相乘的第一乘积; 所述比较器用于将所述数据读取电路输入的所述前向计算的值进行比较操作得到比较结 果;所述第三选择器用于根据第四控制信号选择地从所述比较器中选取所述比较结果或从 所述乘法器中选取所述第一乘积作为导数存入所述缓冲寄存单元。
神经网络训练采用的典型的非线性函数主要包括 ReLU (Rectified Linear Units) 函 数和 Sigmoid函数, 其中, 由比较器进行比较操作后输出的结果是针对 ReLU函数的求导 计算结果,经过减法器及乘法器相应操作由乘法器输出的结果是针对 Sigmoid函数的求导 计算,通过控制第三选择器选择输出针对 ReLU函数的求导还是针对 Sigmoid函数的求导, 能够满足基于 RRAM交叉阵列的神经网络训练中两种主要的求导计算; 可选的, 所述求导 计算电路也可以应用于其他非基于 RRAM交叉阵列的神经网络的求导计算。
第一方面的另一种可能的实现方式, 所述误差计算电路包括: 第四选择器、 加法器、 第二减法器; 所述第四选择器的输出端口与所述加法器的输入端口连接; 所述加法器的输 出端口与所述第二减法器的输入端口连接; 其中, 所述第四选择器用于根据第五控制信号 选择地读取 0或读取偏重 r, 且将读取的 0或 r输入所述加法器; 所述加法器用于从所述 数据读取电路或所述缓冲寄存单元中读取第七数据,且将所述第四选择器输入的数据与读 取的所述第七数据进行相加得到第一和值;所述第二减法器用于从所述数据读取电路或所 述缓冲寄存单元中读取第八数据,且将所述加法器输入的所述第一和值与所述第八数据进 行相减得到误差。
通过第四选择器根据不同的控制信号选取不同数据, 能够满足基于 RRAM交叉阵列的 监督学习神经网络训练和强化学习神经网络训练两种神经网络训练的误差计算; 可选的, 所述误差计算电路也可以应用于其他非基于 RRAM交叉阵列的神经网络的求导计算。
可选的, 所述神经网络训练为监督学习神经网络训练; 所述第四选择器具体用于根据 所述第五控制信号读取 0, 且将读取的 0输入所述加法器; 所述加法器具体用于从所述缓 冲寄存单元中读取所述样本数据对应的真值 y*, 且将所述样本数据对应的真值 y*与所述 第四选择器输入的 0进行相加得到所述第一和值并输入所述第二减法器;所述第二减法器 具体用于从所述数据读取电路或所述缓存寄存单元读取前向计算的值为 f (x), 且将所述 前向计算的值 f (x)与所述加法器输入的所述第一和值进行相减得到误差。
可选的, 所述神经网络训练为深度强化学习神经网络训练; 所述第四选择器具体用于 根据第五控制信号读取 r, 且将所述 r输入所述加法器; 所述加法器具体用于从所述数据 读取电路或所述数据缓存寄存单元中读取前向计算的值 maX2»;^), 且将所述前 向计算的值 maX。 与所述 r相加后得到所述第一和值并输入所述第二减法器; 所述第二减法器具体用于从所述数据读取电路或所述缓冲寄存单元读取2 Ω;^), 且将所述2 所述加法器输入的所述第一和值相减得到误差。
第一方面的另一种可能的实现方式, 所述数据读取电路还包括权值更新控制电路; 所述权值更新控制电路用于判别权重值是正值还是负值,且根据判别结果分别输出第 一 RESET信号和第二 RESET信号; 其中, 所述权重值用第一权重值 W+和第二权重值 W-的 差值表示, 所述第一权重值 W+和所述第二权重值 W-均为正值, 所述第一 RESET信号用于 指示所述权重值是正值,所述第二 RESET信号用于指示所述权重值是负值,所述第一 RESET 信号用于控制存储所述第二权重值 W-的所述 RRAM交叉阵列的对应节点进行 RESET操作, 所述第二 RESET信号用于控制存所述第一权重值 W+的所述 RRAM交叉阵列的对应节点进行 所述 RESET操作, 所述 RESET操作用于指示从低阻值向高阻值进行调整。神经网络训练具 有访问密集的特性, 在 RRAM用于神经网络训练时, 频繁的写操作会大幅降低 RRAM的可靠 性, 尤其对于 RRAM的 SET操作, 即从高阻值调到低阻值, 通过权值更新控制电路判别权 重值是正值还是负值,并将结果作为控制信号控制连接存储 W-或 W+的 RRAM交叉阵列的开 关, 对 RRAM交叉阵列进行 RESET操作, 避免对 RRAM进行 SET操作, 提高 RRAM用于神经 网络训练的可靠性。
相对于现有技术, 本申请提供了一种用于支撑基于 RRAM交叉阵列的神经网路训练的 外围电路, 旨在通过该外围电路支撑基于 RRAM交叉阵列的神经网络训练, 加速神经网络 计算。
第二方面, 本申请提供了一种神经网络训练系统, 该系统包括: 控制电路、 阻变存储 器 RRAM交叉阵列、 上述第一方面各可能的实现方式中的任一个所描述的外围电路; 所述 控制电路用于产生多种控制信号,所述多种控制信号至少包括: 第一控制信号、 第二控制 信号、 第三控制信号、 第四控制信号、 第五控制信号。
本申请提供的神经网络系统能够加速神经网络训练, 相比由中央处理器 (Central
Processing Unit , CPU) 或者图形处理器 ( Graphics Processing Unit, GPU) 进行神经 网络训练中的运算, 提升了至少 5. 7倍。 附图说明
图 1 是本申请的提供的一种 RRAM交叉阵列结构的示意图;
图 2 是本申请的一个实施例提供的一种神经网络训练系统的结构示意图; 图 3 是本申请的一个实施例提供的一种数据准备电路的示意图;
图 4 是本申请的一个实施例提供的一种数据选择电路的示意图;
图 5 是本申请的一个实施例提供的一种数据读取电路的示意图;
图 6 是本申请的一个实施例提供的一种最大池化操作电路的示意图;
图 7 是本申请的一个实施例提供的一种最大池化操作电路的示意图;
图 8 是本申请的一个实施例提供的一种权值更新控制电路的示意图;
图 9 是本申请的一个实施例提供的一种求导计算电路的示意图; 图 10 是本申请的- -个实施例提供的- 种误差计算电路的示意图;
图 11是本申请的 -个实施例提供的 种前向计算的数据流向示意图;
图 12 是本申请的- -个实施例提供的- 种数据流入和流出 RRAM交叉阵列的示意图; 图 13 是本申请的- -个实施例提供的- 种反向传播数据流向示意图;
图 14 是本申请的- -个实施例提供的- 种数据流入和流出 RRAM交叉阵列的示意图 图 15 是本申请的- -个实施例提供的- 种权值更行数据流向示意图;
图 16是本申请的 -个实施例提供的 中深度强化学习神经网络训练的示意图。
具体实施方式
下面将结合附图, 对本申请实施例中的技术方案进行描述。
本申请提供的一种外围电路可以应用基于阻变存储器 (Resistive Random Access Memory , RRAM) 交叉阵列 (Crossbar) 的神经网络训练, 当然也可以应用到基于类似于 RRAM性能的其他器件的神经网络训练, 以支撑神经网络训练主要的三个步骤: 前向计算、 反向传播和权值更新的计算, 当然地, 也可以仅用于前向计算、 反向传播和权值更新的步 骤中的一个或多个步骤的计算, 神经网络训练不限于监督学习神经网络训练、 深度强化学 习神经网络训练, 也可以应用于其他的神经网络训练或新出现的神经网络训练。 如图 2所 示, 是本申请的一种可能的应用场景下的系统架构示意图。 存储介质 (,Mem0ry Subarray, Mem Subarray) 与功能电路 (Full function Subarray, FF Subarray)进行通信, 实现功能电路将数据存入存储介质或从存储介质中读取数据。存储介质可以是任何形式的 非易失性存储介质, 在此不作限制。 缓冲寄存单元可以是存储介质中的一部分, 也可以是 单独的存储介质, 当缓冲寄存单元作是单独的存储接介质时, 可以是任何形式的非易失性 或易失性存储介质, 该易失性存储介质是指能暂时存数数据, 断电后会丢失数据的存储介 质。 缓冲寄存单元与该外围电路的功能电路可以通过高数据带宽 (高带宽)进行数据的交 互, 其中, 高带宽为技术领域一般意义上的理解, 也可能随着技术发展而变化, 在此不作 具体限定, 因此将神经网络训练过程中产生的一些中间数据可以暂存于缓冲寄存单元, 这 样能提高计算数据读写效率。 功能电路的划分是为更方便的描述该应用场景下的系统架 构, 功能电路主要包括: 数据准备电路、 数据选择电路、 数据读取电路和反向训练计算电 路。 功能电路中的各电路之间可以进行通信用于数据传输, 且功能电路中的各电路可以与 存储介质、 缓冲寄存单元进行通信用于读取或写入数据。 其中, 数据准备电路与 RRAM交 叉阵列进行通信, 用于向 RRAM交叉阵列导入数据; 数据选择电路与 RRAM交叉阵列进行通 信, 用于从 RRAM交叉阵列中导出数据。 RRAM具有阻值可变特性, 能够表征多值, 不同于 传统存储介质只能表达 0和 1二值, 有 RRAM构成的交叉阵列结构, 非常适应神经网络中 的矩阵向量乘运算, 特别指出的是, RRAM交叉阵列的构造可以根据神经网络结构具体排 布, 以及 RRAM 交叉阵列与神经元的具体运算, 本申请均不作任何限制。 控制电路 controller 用于通过控制信号控制功能电路的各个电路的一系列操作, 可选的, 控制电 路还用于以及 RRAM阵列的操作, 具体地, 例如, 控制电路向数据准备电路发送一个控制 信号, 数据准备电路接收到该控制信号后导通与控制信号相应的开关以使得数据电路与 RRAM交叉阵列的各行的通路打开, 实现数据准备电路向 RRAM交叉这列的各行导入数据。 上述实施例描述的数据准备电路、 数据选择电路、 数据读取电路、 反向训练计算电路 的形式通常情况下是一种可以实现上述实施例所描述的相应功能的电路结构或是硬件结 构, 控制电路可以是能够产生控制信号的任何一种控制器, 也可以是中央处理器、 图像处 理器等器件, 并且上述数据准备电路、 数据选择电路、 数据读取电路、 反向训练计算电路 以及控制电路可以装配在一个基板上或多个基板上,值得注意的是也可以制作成一种集成 电路。 上述实施例描述的系统架构整体可以组成一个专门用于神经网络训练的设备; 也可 以将该系统架构组装在一台设备中, 例如计算机、服务器、终端; 在技术可实现的情况下, 也可以将整个系统架构整体高度集成在硅片上; 需要注意的, 该系统架构中的各电路在不 影响其功能的前提下也可以分别置于不同的设备中, 通过无线或有线进行通信, 例如将存 储介质与缓冲寄存单元、 功能电路、 RRAM交叉阵列分别置于两个设备。 总之, 其系统架 构本质上没有发生变化, 任何形式的呈现都落入本申请的保护范围内。 该系统应用于支撑 基于 RRAM 的神经网络训练, 包括前向计算、 反向传播和权值更新, 相比由中央处理器 ( Central Processing Unit, CPU) 或者图开处理器 (Graphics Processing Unit, GPU) 进行神经网络训练中的运算, 提升了至少 5. 7倍。
下面对本申请实施例中可能所涉及的一些通用概念或者定义作出解释, 需要说明的 是, 本文中的一些英文简称可能随着神经网络技术演进发生变化, 具体演进可以参考相应 标准或权威学术的描述; 客观地, 一个概念或定义也可能存在不同的名称, 例如阻变存储 器的英文简称可以是 RRAM, 也可以是 ReRAM。
本申请文本及附图可能涉及的英文简称及其对应的中文名称和英文全称, 如下:
RRAM Resistive Random Access Memory 阻变存储器
Vol Voltage Source 电压源
ADC Analog to Digital Converter 模拟 /数字转换器
DAC Digital to Analog Converter 数字 /模拟转换器
WDD Wordl ine Driver and Decoder 字线驱动和解码器
TG Transfer Gate 传输门
AMP Ampl ifier 放大器
WD Write Driver 写驱动器
SW Switch 开关
Reg Register 寄存器
Comp comparer 比车交器
MUL Multipl ier 乘法器
ADD Adder 加法器
SUB Subtracter 减法器
SA Sensing Ampl ifier 读出放大器
附图中涉及的一些英文描述均为本技术领域的技术人员可以理解的,在此未能全部地 释明, 并不影响理解本申请所描述的技术方案。
本申请中所述的 "数据", 通常情况下以电信号的形式传递, 例如电流信号、 电压信 号, 电信号可以用于指示与神经网络训练相关的样本数据、 预处理结果、 各环节的计算结 果等。
本申请的附图中涉及的 buffer, 通常情况下对应的是图 2对应的实施例中的缓冲寄 存单元。
下面将结合附图, 对本申请实施例所提供的技术方案进行更为详细的描绘。
本申请的一个实施例描述了一种用于支撑基于 RRAM 的神经网络训练的外围电路结 构, 该外围结构包括: 数据准备电路、数据选择电路、数据读取电路、反向训练计算电路; 可选的, 该外围电路还包括存储器, 该存储器可以划分出缓冲寄存单元。
本申请的一个实施例描述了一种数据准备电路, 如图 3所示。 该数据准备电路用于根 据控制信号选择地将输入数据准备电路中的数据进行预处理后, 导入 RRAM交叉阵列的行 还是列中。 可选的, 上述控制信号可以由控制单元根据当前计算过程产生, 具体的是前向 计算和反向传播对应的控制信号是不同的,数据准备电路接收不同的控制信号分别作出相 应的操作, 例如, 若当前计算为神经网络前向计算时, 控制单元发送给数据准备电路的控 制信号, 使得数据准备电路将连接 RRAM交叉阵列的各行的通路打开, 且将其预处理的结 果导入 RRAM交叉阵列的各行中; 若当前计算为神经网络反向传播时, 控制单元发送给数 据准备电路的控制信号, 使得数据准备电路将连接 RRAM交叉阵列的各列的通路打开, 且 将其预处理的结果导入 RRAM交叉阵列的各列中。 需要注意的, 连接 RRAM交叉阵列的各行 或各列的通路在同一时刻只有一路可以打开。
进一步地, 数据准备电路划分为两个传输门 (Transfer Gate, TG) 和字线驱动和解 码器 (Wordl ine Driver and Decoder, WDD) 两个组成部分,图 3中除了两个 TG之外的 电路结构构成该字线驱动和解码器 WDD。 其中, WDD用于对输入数据准备电路中的数据进 行预处理, 预处理后的结果经过来两个 TG中的一个 TG传输给 RRAM交叉阵列中。 通过控 制信号对上述数据准备电路中的两个 TG的导通还是闭合进行控制,其中一个 TG导通可以 使得数据准备电路将连接 RRAM交叉阵列的各行的通路打开, 对应上述的当前计算为神经 网络前向计算;另一个 TG导通可以使得数据准备电路将连接 RRAM交叉阵列的各列的通路 打开, 对应上述的当前计算为神经网络反向传播。 两个 TG不在同一时刻同时导通, 具体 地根据当前计算过程产生的控制信号导通或闭合。 可选的, 用一种二选一的开关器件代替 两个 TGo
通过在两个传输门 TG控制数据准备电路与 RRAM交叉阵列的通路,复用一个 WDD电路, 根据当前计算选择地将数据导入 RRAM交叉阵列的各行或是各列, 即可以支撑神经网络的 前向计算和反向传播, 这样的电路结构更为简单, 成本低。
本申请的一个实施例描述了一种数据选择电路, 如图 4所示。 该数据选择电路用于根 据控制信号选择地从 RRAM交叉阵列的各行还是 RRAM交叉阵列的各列中导出数据,且对导 出的数据进行相应的非线性计算和减法操作, 并将处理后数据输出给下一个电路单元, 例 如数据选择电路。 与数据准备电路类似, 其中数据选择电路接收到的控制信号由控制单元 根据当前计算过程产生, 不再赘述。
进一步地, 数据选择电路也可以划分为两个 TG和多路选择器 (Column MUX)、 预处理 电路两个组成部分, 图 4中除了两个 TG和多路选择器之外的电路结构构成预处理电路。 其中, 多路选择器用于从 RRAM交叉阵列中经过两个 TG中的一个 TG导出的数据, 预处理 电路用于进行非线性函数计算和减法操作等预处理。 与数据准备电路中的两个 TG类似, 不同之处在于, 数据选择电路在两个 TG中的一个 TG导通时是从 RRAM交叉阵列中的行或 列中导出数据, 具体如何控制 TG可以参见数据准备电路的相应描述, 在此不再赘述。
通过两个传输门 TG控制数据选择电路与 RRAM交叉阵列的通路,复用一个多路选择器 和预处理上电路, 根据当前计算选择地从 RRAM交叉阵列的各行或是各列导出数据, 即可 以支撑神经网络的前向计算和反向传播, 这样的电路结构更为简单, 成本低。
本申请的一个实施例描述了一种数据读取电路, 如图 5所示, 数据读取电路包括读取 放大电路、 最大池化操作电路和权值更新控制电路。 该数据读取电路用于权值更新控制操 作和对上述数据选择电路输入的数据进行最大池化操作,且将所述最大池化操作后得到的 数据存入缓冲寄存单元中。图 5中除了最大池化操作电路和权值更新控制电路之外的电路 结构构成读取放大电路;该读取放大电路用于从缓冲寄存单元或数据选择电路读取数据并 进行处理后传输给最大池化操作电路。 图 6 所示为该数据读取电路中的最大池化操作电 路, 神经网络前向计算时的最大池化操作一般是 4位操作数的求最大池化, 而以深度强化 学习为代表的神经网络训练或计算过程中,还会涉及操作数位数大于 4的的求最大池化操 作, 图 6中的最大池化操作电路不仅可以支撑 4位操作数的求最大池化操作, 还可以支撑 大于 4多位操作数的求最大池化操作, 可以根据具体需求灵活调整。 该最大池化操作电路 包括 m个寄存器 ( Regi ster, Reg) , 即 Reg 1, Reg 2'"Reg m ;其中, 除了 Reg 1, Reg 2 , Reg 3, Reg 4之外的 Reg 5_ Reg m的寄存器中的每个寄存器还连接一个二选一的多路选 择器。 结合图 7进一步说明, 当目前为神经网络计算中典型的包含 4位操作数的最大池化 操作时, 与 Reg 5- Reg m中每个寄存器通过与其连接的二选一的多路选择器读取 0, 即 与 Reg 5- Reg m中每个寄存器相连接的二选一的多路选择器根据控制信号选择读取 0输 给相应的寄存器;而前 4位寄存器仍从缓存寄存单元中读取待最大池化操作的 4位操作数。 当目前为深度强化学习的最后一层求最大值操作时 ,根据操作数的数目 n,控制 Reg 5-Reg n中的每个寄存器单元对应的二选一选择器从缓存寄存单元中读取待最大池化操作的 n位 操作数, Reg ( n+1 ) -Reg m 中的每个寄存器单元对应的二选一的多路选择器根据控制信 号读取 0。
可选的,如图 6所示的最大池化操作电路中的 Regl-m都连接一个二选一的多路选择, 能够更灵活地调整从缓冲寄存单元中正常读取数据的寄存器个数,实现位数小于 4操作数 的求最大池化操作。
通过在最大池化操作电路中增加与二选一的多路选择器连接的寄存器,能够支撑涉及 多位操作数的求最大池化操作。
由于 RRAM受工艺和制造材料的限制, 对 RRAM进行 SET操作, SET操作是指从高阻值 调整到低阻值, 容易造成 RRAM存储值的突变, 降低 RRAM的可靠性; 然而神经网络训练时 又需要频繁地写操作。 RRAM交叉阵列每个存储节点无法存储负值, 为了利用 RRAM表达权 重, 每个权重 w被分解成 w+和 w-两部分存储在两套 RRAM交叉阵列对应的节点上, 其中 w+和 w-都为正值。每个权重 w的真实值和正负由两部分权重进行做差得到, 即 w= w+- w 本申请的发明思路为: 当神经网络反向训练时求得的权重调整为负值时, 意味着需要减小 权重, 其中权重对应的为 RRAM阻值的倒数, 即电导, 即增大减小对应 RRAM存储节点的电 导值, 为减少 SET操作, 只对存储 w+的 RRAM交叉阵列对应节点进行 RESET操作, RESET 操作是指低阻值调整到高阻值, 则电导值从高到低。 反之, 若神经网络反向训练时求得的 权重调整为正值时, 意味着需要增大权重, 即增大对应 RRAM存储节点的电导值, 只对存 储 w-的 RRAM交叉阵列对应节点进行 RESET操作, 则电导值从低到高。
为了实现上述权值更新的控制, 可选的, 数据读取电路还包括权值更新控制电路, 如 图 8所示。 该权值更新控制电路用于控制经网络训练过程中的权重值或参数的更新。
进一步地, 该权值更新控制电路包括异或判别子电路, 用于判别输入权值更新控制电 路中的权重是正值还是负值,且根据判别结果由控制单元产生第一控制信号或第二控制信 号, 第一控制信号用于指示权重值是正值, 第二控制信号用于指示权重值是负值, 第一控 制信号控制存储 W-的 RRAM交叉阵列的对应节点进行 RESET操作, 第二控制信号用于控制 存 W+的 RRAM交叉阵列的对应节点进行 RESET操作。 更为具体地, 若异或电路运算结果输 出为 1则代表 为正, 则控制连接存储 w-的 RRAM交叉阵列对应的开关打开 , 从而完成对 应的权重更新操作。 反之, 若异或电路运算结果输出为 0 则代表为负, 则控制连接存储 w+的 RRAM交叉阵列对应的开关打开, 从而完成对应的权重更新操作。
通过权值更新控制电路对权值更新的方式进行优化, 减少 SET操作, 能够提高 RRAM 写的可靠性。
本申请的一个实施例描述了一种反向训练计算电路,该反向训练计算电路用于将输入 反向训练计算电路中的数据进行误差计算和求导计算。反向训练计算电路包括求导计算电 路和误差计算电路。
图 9所示为本申请的一个实施例描述的求导计算电路。该求导计算电路包括两个两路 选择器、 比较器、 减法器和乘法器, 可以针对神经网络训练涉及的典型非线性函数进行求 导计算, 例如 ReLU ( Rectified Linear Units ) 函数、 Sigmoid函数。 两种非线性函数 的导数如下所示-
ReLU函数导数: ― 腿 S Sigmoid函数导数:
该求导计算电路的一种具体结构参见图 9, 其中, 一个两路选择器 (位于图 9中的右 上区域)的输出端口与减法器的输入端口连接; 减法器的输出端口与乘法器的输入端口连 接;乘法器的输出端口与另一个两路选择器 (位于图 9 中的下面区域) 的一个输入端口连 接; 比较器的输出端口与该两路选择器的另一个输入端口连接。
针对 ReLU函数的求导计算, 求导计算电路将从数据读取电路输出的数据输入求导计 算电路的比较器 (comparer, Comp ) 中进行比较操作后输出的结果为 ReLU函数的求导计 针对 Sigmoid函数的求导计算,求导计算电路将从数据读取电路或缓冲寄存单元中读 取前向计算的值 f (x),并将 f (X)输入求导计算电路的减法器与 1进行做差,在将作差的结 果输上入求导计算电路的乘法器进行相乘,该乘法器输出结果为 Sigmoid函数的求导计算 结果。 求导计算电路的根据控制信号通过与其比较器的输出端和乘法器的输出端相连的两 路选择器选择地从其比较器还是乘法器中选取计算结果输出。 其中, 控制信号根据当前求 导计算针对的函数不同由控制电路产生不同的控制信号; 例如, 针对 ReLU函数进行求导 计算,那么控制信号会控制与求导计算电路的比较器相连的两路选择器选取比较器输出的 结果作为求导计算电路的输出结果, 实现针对 ReLU函数的求导计算; 或者, 针对 Sigmoid 函数进行求导计算,那么控制信号会控制与求导计算电路的乘法器相连的两路选择器选取 乘法器输出的结果作为求导计算电路的输出结果, 实现针对 Sigmoid函数的求导计算。进 一步地, 求导计算电路将输出结果存入缓冲寄存单元 /存储介质中。
通过求导计算电路可以支撑基于 RRAM的神经网络训练中的求导计算, 提高神经网络 训练的求导计算的效率。
图 10所示为本申请的一个实施例描述的误差计算电路。 该误差计算电路包括两路选 择器、 加法器、 减法器, 可以用于进行神经网络训练过程中涉及的误差计算, 该神经网络 训练典型的为监督学习训练和深度强度学习训练。
该误差计算电路的一种具体结构参见图 10, 其中, 误差计算电路中的两路选择器的 输出端口与其加法器的输入端口连接;该加法器的输出端口与误差计算电路中的减法器的 输入端口连接; 两路选择器用于根据接收到的控制信号选择地读取 0或读取 r, 其中 r为 偏重, 存储在存储器中, 可根据训练次数动态刷新; 加法器用于根据控制信号选择地从缓 冲寄存单元中读取 y*或数据读取电路输出的 max2»;^) 值;且将输入的数据进行 相加操作;减法器用于根据控制信号选择地从缓冲寄存单元中读取前向计算的计算值 f (x) 或 G "U, 且将输入的数据进行相减操作。
上述控制信号根据神经网络训练的不同产生相应的控制信号, 以控制误差计算电路中 的两路选择器、 加法器、 减法器读取的具体数据。 下面以监督学习训练和深度强化学习训 练为例, 详细的描述误差计算电路:
若神经网络训练为监督学习神经网络训练,控制信号控制误差计算电路中的两路选择 器选择 0并输入与其相连的加法器,误差计算电路从缓冲寄存单元中读取神经网络训练的 样本数据中的真值 y*并输入加法器, 加法器将 y*与 0相加并将相加的结果输入与其相连 的减法器, 误差计算电路从缓冲寄存单元读取神经网络前向计算值 f (x)并输入该减法器, 完成 f (x) -y*的操作, 得到针对监督学习神经网络训练的误差计算结果; 进一步地, 将得 到的误差计算结果传输给数据准备电路进行法相训练计算和误差传播。
若神经网络训练为深度强化学习神经网络训练,控制信号控制误差计算电路中的两路 选择器选择 r并输入与其相连的加法器,误差计算电路从缓冲寄存单元中或数据读取电路 中读取前向计算的值 max2 'a ;^)并输入加法器, jra法器将 max '(^ ;^与 r 相加并将相加的结果输入与其相连的减法器, 误差计算电路从缓冲寄存单元读取 并输入改减法器, 减法器将加法器输入的相加结果与 G (^; 进行做差; 进 一步地, 将得到的误差计算结果传输给数据准备电路进行法相训练计算和误差传播。
通过误差计算电路可以支撑基于 RRAM的神经网络训练中的误差计算, 提高神经网络 训练的误差计算的效率。
本申请的一个实施例以监督学习为例,详细地描述本申请的实施例描述的外围电路支

Claims

撑基于 RRAM的监督学习神经网络训练的过程, 主要包括前向计算、 反向传播、 权值更新。 1 ) 前向计算 从存储器中读取用于监督学习神经网络训练的样本数据 (x, y), 将读取存入缓冲寄 存单元 (Buffer array) 中, 其中 x为该神经网络前向计算的输入数据, y为神经网络前 向计算输出对应的真值, 数据流向如图 11中编号为 1的箭头所示。 样本数据 X同时被送 入多个 RRAM交叉阵列对应的数据准备电路中进行预处理,如图 11中编号为 2的箭头方向 所示。 由于是前向计算, 数据经过如图 3所示的数据准备电路预处理后, 数据准备电路中 的与 RRAM交叉阵列的各行通路相连的 TG接收到控制信号, 控制 TG导通使得数据准备电 路将连接 RRAM交叉阵列的各行的通路打开,样本数据 X经过打开的连接 RRAM交叉阵列的 各行的通路流入 RRAM交叉阵列中进行矩阵向量乘计算。 继而, 如图 12所示, 将样本数据 X流入 RRAM交叉阵列的每行中与 RRAM交叉阵列每 个节点完成模拟运算, 并将输出结果从 RRAM交叉阵列的每一列输出。 然后, 如图 4所示 的数据选择电路中的与 RRAM交叉阵列的各列通路相连的 TG接收到控制信号, 控制 TG导 通使得数据选择电路将连接 RRAM交叉阵列的各列的通路打开, 从而利用数据选择电路中 的选择器 Column Mux有选择的将 RRAM交叉阵列计算结果按列读出。 再则, 经由数据读取电路中的如图 6所示的最大池化电路中的 Reg 5- Reg m中每个 寄存器所对应的二选一多路选择器接收到控制信号,接收到控制信号的二选一多路选择器 输入为 0, 这样就可以完成针对前向计算的 4位操作数的最大池化操作, 最后将计算结果 存入缓存寄存单元中, 数据流向如图 11中编号为 3的箭头所示。 2 ) 反向传播 误差计算: 从缓存寄存器中读取前向计算输出结果 , 与真值 , 同时送入图 10 所示的误差计算电路, 通过控制图 10所示的误差计算电路, 输出当前神经网络前向计算 的值与真值之间的误差 Ay, 将误差 Ay存入缓冲寄存单元中, 具体误差计算可以参见图 10 对应的实施例的描述, 在此不再赘述。 求导计算: 如图 13中编号为 1的箭头所示, 将存储在缓冲寄存单元中的误差计算 电路计算的误差 Ay送入如图 9所示的反向训练计算电路, 用于与求导计算电路计算的导 数一同输给相应单元计算梯度误差, 具体地, 根据误差 Ay 与求导计算得到的导数进行点 乘操作算出来梯度误差, 计算梯度误差的单元, 本申请不做限定, 并且梯度误差用于与权 重值相乘算出上一层的误差, 继而可以一层一层的往前传播。求导运算过程需要根据待求 导的函数为 ReLU还是 Sigmoid通过控制图 9所示的求导计算电路实现对应的求导计算, 具体求导计算可以参见图 9对应的实施例的描述, 再次不再赘述。 反向传播: 求导计算电路求得的导数(即梯度), 一路送入神经网络对应的 RRAM交 叉阵列进行反向传播, 数据流向为图 13中编号为 2的箭头方向。 具体计算过程为: 求导 计算电路计算的导数经过如图 3所示的数据准备电路预处理后,数据准备电路中的与 RRAM 交叉阵列的各列的通路相连的 TG接收到控制信号,控制 TG导通使得数据准备电路将连接 RRAM交叉阵列的各列的通路打开, 求导计算电路计算的导数经过打开的连接 RRAM交叉阵 列的各列的通路流入 RRAM交叉阵列中。 继而, 如图 14所示, 求导计算电路计算的导数流入 RRAM交叉阵列的每一个列中进行 反向运算 (与神经网络前向计算相反), 经由 RRAM交叉阵列计算的结果从 RRAM交叉阵列 的每一行输出。 然后, 如图 4所示的数据选择电路中的与 RRAM交叉阵列的各行通路相连 的 TG接收到控制信号, 控制 TG导通使得数据选择电路将连接 RRAM交叉阵列的各行的通 路打开, 从而利用数据选择电路中的选择器 Column Mux有选择的将 RRAM交叉阵列计算结 果按行读出, 并将从 RRAM交叉阵列中读取的计算结果输入数据读取电路进行处理完成数 据的输出, 最后将导数 (即梯度) 传播结果存入缓冲寄存单元中。 另外, 求导计算电路计算的导数结果, 另一路流入数据准备电路中用于与输入样本数 据 X计算权值误差,从而进行权值更新, 数据流向为图 13中编号为 3的色箭头方向所示。 3 ) 权值更新 当完成图 11所示的额前向计算和图 13所示的方向传播计算之后, 将输入样本数据 X 与梯度误差进行向量矩阵乘运算求得对应权重的变化 AW, 数据流向如图 15中箭头方向所 示。 通过控制信号控制如图 8所示的权值更新控制电路完成针对相应的 RRAM交叉阵列存 储的权值的更行, 具体的权值更新控制方式可以参见图 8对应的实施例的描述, 在此不再 赘述。 通过本申请实施例描述的一种外围电路可以充分利用 RRAM交叉阵列结构, 在增加有 限的电路基础上可以实现监督学习神经网络训练的加速, 包括前向计算、 反向传播及权值 更新, 能够提升在神经网络训练时 RRAM写的可靠性。 本申请的一个实施例以深度强化学习为例,详细地描述本申请的实施例描述的外围电 路支撑基于 RRAM的深度强化学习神经网络训练的过程。 深度强化学习包含三个阶段: 前向计算、 反向传播、 权值更新。 不同于监督学习, 如图 16所示, 深度强化学习往往使用两套神经网络以提升训练收敛效率。 简要介绍下深 度强化学习的计算过程: 首先对 A和 B两套同样设置的神经网络进行前向计算, 并根据梯 度误差对 A神经网络进行反向传播和权值更新, 而同时保持 B网络权重不变。经 m次迭代 训练之后, 将 A神经网络的权重拷贝至 B神经网络, 并替换 B神经网络的原始权重。 以此 迭代直至训练收敛。 深度强化学习的误差反向传播和权值更新两个主要计算过程都与本申请实施例描述 的监督学习相近, 在此不再赘述。 再次主要描述前向计算时求最大值的不同。 在上述深度 强化学习的计算过程中, A神经网络和 B神经网络的前向计算过程都涉及求最大值操作。 A神经网络需要计算最大 Q值的动作 action, 而 B神经网络同样计算输出最大 Q' 值。 深 度强化学习中求最大值操作不同于监督学习中的最大池化操作, 操作数往往大于 4个。假 设深度强化学习对应的神经网络的输出纬度为 n, 意味着要对 n个数求最大值。 本发明实 施例描述的外围电路, 在数据读取电路中设计了多个寄存器单元 (假设为 m) 用于存储被 操作数。那么通过控制第 5至 n个寄存器单元对应的二选一多路选择器读取待求最大值的 操作数, 而其余 m-n个寄存器单元对应的二选一多路选择器选择读取 0。 具体的最大池化 过程可以参见图 7对应的实施例的描述, 在此不再赘述。 通过本申请实施例描述的一种外围电路可以充分利用 RRAM交叉阵列结构, 在增加有 限的电路基础上可以支撑深度强化学习神经网络训练, 包括前向计算、 反向传播及权值更 新, 且能够加速神经网络训练及在神经网络训练时 RRAM写的可靠性。 最后应说明的是: 以上各实施例仅用以说明本发明的技术方案, 而非对其限制; 尽管 参照前述各实施例对本发明进行了详细的说明, 本领域的普通技术人员应当理解: 其依然 可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行 等同替换; 而这些修改或者替换, 并不使相应技术方案的本质脱离本发明各实施例技术方 案的范围。 权 利 要 求
1、 一种外围电路, 用于支撑基于阻变存储器 RRAM交叉阵列的神经网络训练, 其特征 在于, 包括:
数据准备电路,用于将输入所述数据准备电路的第一数据进行第一预处理,且根据第一 控制信号选择地将经所述第一预处理后得到的预处理数据导入所述 RRAM交叉阵列的行或 列中, 所述第一数据包括用于所述神经网络训练的样本数据;
数据选择电路,用于根据第二控制信号选择地从所述 RRAM交叉阵列的行或列中导出第 二数据, 且对所述第二数据进行第二预处理得到第三数据; 其中, 所述第二数据为所述预 处理数据在所述 RRAM交叉阵列中经计算后得到的数据; 所述第一控制信号与所述第二控 制信号存在对应关系, 所述对应关系用于指示: 所述数据准备电路根据所述第一控制信号 将所述第一数据进行第一预处理后导入所述 RRAM交叉阵列的行中对应于所述数据选择电 路根据所述第二控制信号从所述 RRAM交叉阵列的列中导出所述第二数据, 和, 所述数据 准备电路根据所述第一控制信号将所述第一数据进行第一预处理后导入所述 RRAM交叉阵 列的列中对应于所述数据选择电路根据所述第二控制信号从所述 RRAM交叉阵列的行中导 出所述第二数据;
数据读取电路,用于权值更新控制操作和根据输入所述数据读取电路的第四数据进行 最大池化操作得到第五数据, 所述第四数据包括所述第三数据;
反向训练计算电路,用于根据输入所述反向训练计算电路中的第六数据进行误差计算 和求导计算, 所述第六数据包括包括所述第五数据。
2、 如权利要求 1所述的外围电路, 其特征在于, 还包括: 存储介质;
存储介质, 用于存储所述样本数据及所述数据准备电路、 所述数据选择电路、 所述数 据读取电路和所述反向训练计算电路中至少一个存入的数据。
3、 如权利要求 2所述的外围电路, 其特征在于, 所述存储介质包括: 缓冲寄存单元和 样本数据存储单元;
所述样本数据存储单元用于存储所述样本数据;
所述缓冲寄存单元用于存储所述数据准备电路、 所述数据选择电路、 所述数据读取电 路和所述反向训练计算电路中至少一个存入的数据;
其中,所述数据准备电路、所述数据选择电路、所述数据读取电路和所述反向训练计算 电路均通过高数据带宽与所述缓冲寄存单元进行数据的读取和写入。
4、 如权利要求 1-3任选一所述的外围电路, 其特征在于, 所述数据准备电路包括: 字 线驱动和解码器 WDD、 两个第一传输门 TG;
所述 WDD用于接收所述第一数据,且将所述第一数据进行所述第一预处理得到预处理; 所述两个第一 TG并联, 且与所述 WDD的输出端口连接;
其中, 所述两个第一 TG包括第一行 TG和第一列 TG, 且所述第一行 TG和所述第一列
TG不在同一时刻导通; 所述第一行 TG用于根据所述第一控制信号导通所述 WDD连接所述 RRAM交叉阵列的各行的通路, 且将所述 WDD输出的预处理数据导入所述 RRAM交叉阵列的 各行;所述第一列 TG用于根据所述控制信号导通所述 WDD连接所述 RRAM交叉阵列的各列 的通路, 且将所述所述 WDD输出的预处理数据导入所述 RRAM交叉阵列的各列。
5、如权利要求 3所述的外围电路,其特征在于,所述数据选择电路包括:预处理电路、 多路选择器和两个第二 TG;
所述两个第二 TG并联, 且与所述多路选择器的输入端口连接; 其中, 所述第二 TG包 括第二行 TG和第二列 TG, 且所述第二行 TG和所述第二列 TG不在同一时刻导通; 所述第 二行 TG用于根据所述第二控制信号导通所述多路选择器连接所述 RRAM交叉阵列的各行的 通路;所述第二列 TG用于根据所述第二控制信号导通所述多路选择器连接所述 RRAM交叉 阵列的各列的通路;
所述多路选择器用于通过所述两个第二 TG中导通的通路从所述 RRAM交叉阵列导出所 述第二数据;
所述预处理电路用于对所述多路选择器导出的所述第二数据进行所述第二预处理得到 所述第三数据, 且将所述第三数据存入所述缓冲寄存单元中。
6、 如权利要求 3任选一所述的外围电路, 其特征在于, 所述数据读取电路包括: 读 取放大电路、 最大池化操作电路;
所述读取放大电路用于从所述缓冲寄存单元或所述数据选择电路中读取所述第四数 据;
所述最大池化操作电路用于根据所述读取放大电路读取的所述第四数据进行最大池 化操作得到所述第五数据, 且将所述第五数据存入所述缓冲寄存单元中, 所述第五数据为 前向计算的值;
其中, 所述最大池化操作电路包括至少一个第一寄存器; 所述至少一个第一寄存器中 的每个第一寄存器的输入端口连接一个第一选择器,所述第一选择器用于根据第三控制信 号选择地读取 0或从所述缓冲寄存单元中读取待最大池化操作的操作数,且将所述操作数 输入相应的所述第一寄存器。
7、 如权利要求 6所述的外围电路, 其特征在于, 所述最大池化操作电路还包括: 四 个第二寄存器;
所述第二寄存器用于从所述缓冲寄存单元中读取待最大池化操作的操作数。
8、 如权利要求 3所述的外围电路, 其特征在于, 所述反向训练计算电路包括:误差计 算电路和求导计算电路;
所述误差计算电路,用于根据从所述数据读取电路或所述缓冲寄存单元中读取所述第 六数据计算误差, 且将计算得到的误差存入所述缓冲寄存单元中;
所述求导计算电路,用于根据从所述数据读取电路或所述缓冲寄存单元中读取所述第 六数据计算非线性函数的导数, 且将计算得到的导数存入所述缓冲寄存单元中;
其中, 所述第六数据包括前向计算的值。
9、 如权利要求 8所述的外围电路, 其特征在于, 所述非线性函数包括: ReLU函数、 sigmoid函数。
10、 如权利要求 8所述的外围电路, 其特征在于, 所述求导计算电路包括: 第二选择 器、 第三选择器、 比较器、 第一减法器和乘法器;
所述第二选择器的输出端口与所述第一减法器的输入端口连接; 所述第一减法器的输出端口与所述乘法器的输入端口连接;
所述乘法器的输出端口与所述第三选择器的第一输入端口连接;
所述比较器的输出端口与所述第三选择器的第二输入端口连接;
其中,所述第二选择器用于从所述数据读取电路或所述缓冲寄存单元中读取前向计算 的值;所述第一减法器用于将所述第二选择器输入的所述前向计算的值与 1进行相减得到 第一差值; 所述乘法器用于将输入的所述第一差值进行相乘的第一乘积; 所述比较器用于 将所述数据读取电路输入的所述前向计算的值进行比较操作得到比较结果;所述第三选择 器用于根据第四控制信号选择地从所述比较器中选取所述比较结果或从所述乘法器中选 取所述第一乘积作为导数存入所述缓冲寄存单元。
11、 如权利要求 8-10所述的外围电路, 其特征在于, 所述误差计算电路包括: 第四 选择器、 加法器、 第二减法器;
所述第四选择器的输出端口与所述加法器的输入端口连接;
所述加法器的输出端口与所述第二减法器的输入端口连接;
其中, 所述第四选择器用于根据第五控制信号选择地读取 0或读取偏重 r, 且将读取 的 0或 r输入所述加法器;所述加法器用于从所述数据读取电路或所述缓冲寄存单元中读 取第七数据,且将所述第四选择器输入的数据与读取的所述第七数据进行相加得到第一和 值; 所述第二减法器用于从所述数据读取电路或所述缓冲寄存单元中读取第八数据, 且将 所述加法器输入的所述第一和值与所述第八数据进行相减得到误差。
12、 如权利要求 11所述的外围电路, 其特征在于, 所述神经网络训练为监督学习神 经网络训练;
所述第四选择器具体用于根据所述第五控制信号读取 0, 且将读取的 0输入所述加法 器;
所述加法器具体用于从所述缓冲寄存单元中读取所述样本数据对应的真值 y*, 且将 所述样本数据对应的真值 y*与所述第四选择器输入的 0进行相加得到所述第一和值并输 入所述第二减法器;所述第二减法器具体用于从所述数据读取电路或所述缓存寄存单元读 取前向计算的值为 f (X), 且将所述前向计算的值 f (x)与所述加法器输入的所述第一和值 进行相减得到误差。
13、 如权利要求 11所述的外围电路, 其特征在于, 所述神经网络训练为深度强化学 习神经网络训练;
所述第四选择器具体用于根据第五控制信号读取 r, 且将所述 r输入所述加法器; 所述加法器具体用于从所述数据读取电路或所述数据缓存寄存单元中读取前向计算 的值 1^。2^',^^, 且将所述前向计算的值^^ ^^^与所述^相加后得到 所述第一和值并输入所述第二减法器;
所述第二减法器具体用于从所述数据读取电路或所述缓冲寄存单元读取2 Ω;^), 且将所述 所述加法器输入的所述第一和值相减得到误差。
14、 如权利要求 6所述的外围电路, 其特征在于, 所述数据读取电路还包括权值更新 控制电路;
所述权值更新控制电路用于判别权重值是正值还是负值,且根据判别结果分别输出第 一 RESET信号和第二 RESET信号;
其中, 所述权重值用第一权重值 W+和第二权重值 W-的差值表示, 所述第一权重值 W+ 和所述第二权重值 W-均为正值, 所述第一 RESET信号用于指示所述权重值是正值, 所述 第二 RESET信号用于指示所述权重值是负值,所述第一 RESET信号用于控制存储所述第二 权重值 W-的所述 RRAM交叉阵列的对应节点进行 RESET操作, 所述第二 RESET信号用于控 制存所述第一权重值 W+的所述 RRAM交叉阵列的对应节点进行所述 RESET操作,所述 RESET 操作用于指示从低阻值向高阻值进行调整。
15、 一种神经网络训练系统, 其特征在于, 包括: 控制电路、 阻变存储器 RRAM交叉阵 列、 权利要求 1-11任选一所述的外围电路;
所述控制电路用于产生多种控制信号,所述多种控制信号包括: 所述第一控制信号、所 述第二控制信号、 所述第三控制信号、 所述第四控制信号、 所述第五控制信号。
PCT/CN2018/090541 2017-06-16 2018-06-11 一种支撑基于 rram 的神经网路训练的外围电路及系统 Ceased WO2018228295A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP18817002.1A EP3564867A4 (en) 2017-06-16 2018-06-11 PERIPHERAL CIRCUIT AND SUPPORT SYSTEM FOR NEURONAL RESISTIVE RAM-BASED LEARNING
US16/545,932 US11409438B2 (en) 2017-06-16 2019-08-20 Peripheral circuit and system supporting RRAM-based neural network training

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710459633.2A CN109146070B (zh) 2017-06-16 2017-06-16 一种支撑基于rram的神经网络训练的外围电路及系统
CN201710459633.2 2017-06-16

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/545,932 Continuation US11409438B2 (en) 2017-06-16 2019-08-20 Peripheral circuit and system supporting RRAM-based neural network training

Publications (1)

Publication Number Publication Date
WO2018228295A1 true WO2018228295A1 (zh) 2018-12-20

Family

ID=64659544

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/090541 Ceased WO2018228295A1 (zh) 2017-06-16 2018-06-11 一种支撑基于 rram 的神经网路训练的外围电路及系统

Country Status (4)

Country Link
US (1) US11409438B2 (zh)
EP (1) EP3564867A4 (zh)
CN (1) CN109146070B (zh)
WO (1) WO2018228295A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11443171B2 (en) 2020-07-15 2022-09-13 International Business Machines Corporation Pulse generation for updating crossbar arrays
US11568217B2 (en) 2020-07-15 2023-01-31 International Business Machines Corporation Sparse modifiable bit length deterministic pulse generation for updating analog crossbar arrays
TWI812117B (zh) * 2021-08-27 2023-08-11 台灣積體電路製造股份有限公司 用於記憶體內計算(cim)的記憶體元件及方法

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10127494B1 (en) * 2017-08-02 2018-11-13 Google Llc Neural network crossbar stack
WO2020018960A1 (en) * 2018-07-19 2020-01-23 The Regents Of The University Of California Compute-in-memory architecture for neural networks
CN111611534B (zh) * 2019-02-26 2023-12-01 北京知存科技有限公司 一种动态偏置模拟向量-矩阵乘法运算电路及其运算控制方法
CN110209375B (zh) * 2019-05-30 2021-03-26 浙江大学 一种基于radix-4编码和差分权重存储的乘累加电路
KR20210012839A (ko) * 2019-07-26 2021-02-03 에스케이하이닉스 주식회사 연산동작을 수행하는 방법 및 이를 수행하는 반도체장치
US12189987B2 (en) 2019-09-23 2025-01-07 SK Hynix Inc. Processing-in-memory (PIM) devices
US12081237B2 (en) 2019-09-23 2024-09-03 SK Hynix Inc. Processing-in-memory (PIM) devices
KR20210034999A (ko) 2019-09-23 2021-03-31 에스케이하이닉스 주식회사 Aim 장치 및 aim 장치에서의 곱셈-누산 연산 방법
US11539370B2 (en) * 2020-02-23 2022-12-27 Tetramem Inc. Analog to analog quantizer in crossbar array circuits for in-memory computing
CN113379043B (zh) * 2020-02-25 2025-09-26 华为技术有限公司 一种脉冲神经网络
CN111461340B (zh) * 2020-03-10 2023-03-31 北京百度网讯科技有限公司 权重矩阵的更新方法、装置及电子设备
CN113935465B (zh) * 2021-10-13 2026-02-10 安徽芯纪元科技有限公司 一种可配置交叉开关电路及基于其的卷积运算电路
JP7209068B1 (ja) 2021-10-19 2023-01-19 ウィンボンド エレクトロニクス コーポレーション 半導体記憶装置
CN114638279B (zh) * 2022-01-27 2025-05-27 之江实验室 一种单样本学习相似度计算电路和方法
CN115271057B (zh) * 2022-07-28 2026-03-17 中科南京智能技术研究院 一种存内计算阵列及其应用电路
CN115879530B (zh) * 2023-03-02 2023-05-05 湖北大学 一种面向rram存内计算系统阵列结构优化的方法
CN118673781A (zh) * 2023-03-16 2024-09-20 中国科学院微电子研究所 一种阻变存储器芯片的寿命预测方法
CN116338691A (zh) * 2023-03-29 2023-06-27 广东工业大学 一种基于fpga的斯皮尔曼相关检测器加速装置
CN116882475A (zh) * 2023-07-28 2023-10-13 上海寒武纪信息科技有限公司 应用于神经网络的训练方法及装置以及相关产品
CN116996072B (zh) * 2023-09-27 2023-12-12 成都芯盟微科技有限公司 一种流水线型差值比较模数转换器

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105740946A (zh) * 2015-07-29 2016-07-06 上海磁宇信息科技有限公司 一种应用细胞阵列计算系统实现神经网络计算的方法
CN105976022A (zh) * 2016-04-27 2016-09-28 清华大学 电路结构、人工神经网络及用电路结构模拟突触的方法
CN106530210A (zh) * 2016-10-31 2017-03-22 北京大学 基于阻变存储器件阵列实现并行卷积计算的设备和方法
CN106847335A (zh) * 2016-12-27 2017-06-13 北京大学 基于阻变存储阵列的卷积计算存储一体化设备及方法

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5852574A (en) * 1997-12-24 1998-12-22 Motorola, Inc. High density magnetoresistive random access memory device and operating method thereof
WO2006084128A2 (en) * 2005-02-04 2006-08-10 Brown University Apparatus, method and computer program product providing radial addressing of nanowires
US8883568B2 (en) * 2008-06-10 2014-11-11 Brown University Research Foundation Method providing radial addressing of nanowires
CN103778468B (zh) * 2014-01-16 2016-09-07 北京大学 一种基于rram的新型神经网络电路
CN104376362B (zh) * 2014-11-21 2017-10-03 北京大学 用于人工神经网络的突触器件和人工神经网络
US10325006B2 (en) * 2015-09-29 2019-06-18 International Business Machines Corporation Scalable architecture for analog matrix operations with resistive devices
US10248907B2 (en) * 2015-10-20 2019-04-02 International Business Machines Corporation Resistive processing unit
US9646243B1 (en) * 2016-09-12 2017-05-09 International Business Machines Corporation Convolutional neural networks using resistive processing unit array
US10171084B2 (en) * 2017-04-24 2019-01-01 The Regents Of The University Of Michigan Sparse coding with Memristor networks
US10831860B2 (en) * 2018-10-11 2020-11-10 International Business Machines Corporation Alignment techniques to match symmetry point as zero-weight point in analog crosspoint arrays

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105740946A (zh) * 2015-07-29 2016-07-06 上海磁宇信息科技有限公司 一种应用细胞阵列计算系统实现神经网络计算的方法
CN105976022A (zh) * 2016-04-27 2016-09-28 清华大学 电路结构、人工神经网络及用电路结构模拟突触的方法
CN106530210A (zh) * 2016-10-31 2017-03-22 北京大学 基于阻变存储器件阵列实现并行卷积计算的设备和方法
CN106847335A (zh) * 2016-12-27 2017-06-13 北京大学 基于阻变存储阵列的卷积计算存储一体化设备及方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3564867A4 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11443171B2 (en) 2020-07-15 2022-09-13 International Business Machines Corporation Pulse generation for updating crossbar arrays
US11568217B2 (en) 2020-07-15 2023-01-31 International Business Machines Corporation Sparse modifiable bit length deterministic pulse generation for updating analog crossbar arrays
TWI812117B (zh) * 2021-08-27 2023-08-11 台灣積體電路製造股份有限公司 用於記憶體內計算(cim)的記憶體元件及方法

Also Published As

Publication number Publication date
US11409438B2 (en) 2022-08-09
EP3564867A4 (en) 2020-08-26
EP3564867A1 (en) 2019-11-06
CN109146070A (zh) 2019-01-04
US20190369873A1 (en) 2019-12-05
CN109146070B (zh) 2021-10-22

Similar Documents

Publication Publication Date Title
WO2018228295A1 (zh) 一种支撑基于 rram 的神经网路训练的外围电路及系统
US10169297B2 (en) Resistive memory arrays for performing multiply-accumulate operations
CN110750232B (zh) 一种基于sram的并行乘加装置
CN111047031A (zh) 用于神经网络中的数据重用的移位架构
CN112836814A (zh) 存算一体处理器、处理系统以及算法模型的部署方法
CN116126779B (zh) 一种9t存算电路、乘累加运算电路、存内运算电路及芯片
WO2023045160A1 (zh) 数据处理装置以及数据处理方法
Krestinskaya et al. Towards efficient in-memory computing hardware for quantized neural networks: State-of-the-art, open challenges and perspectives
CN115458005A (zh) 数据处理方法和存算一体装置、电子设备
CN117894360A (zh) 存算一体阵列和存算一体装置
Song et al. Xpikeformer: Hybrid analog-digital hardware acceleration for spiking transformers
CN117558320B (zh) 一种基于忆阻交叉阵列的读写电路
US20260011351A1 (en) Weighted summation compute-in-memory circuit and memory
Doevenspeck et al. Noise tolerant ternary weight deep neural networks for analog in-memory inference
Krestinskaya et al. Towards efficient RRAM-based quantized neural networks hardware: state-of-the-art and open issues
CN111859261B (zh) 计算电路及其操作方法
WO2024032220A1 (zh) 基于存算一体电路的神经网络补偿方法、装置及电路
CN120045511A (zh) 一种解决IR-Drop问题的超大规模存内计算核心电路
CN117672322A (zh) 一种实现矩阵-向量乘法的全存内计算电路
Abdelaal et al. Power performance tradeoffs using adaptive bit width adjustments on resistive associative processors
CN119807590B (zh) 一种基于分块矩阵的广义逆矩阵求解电路及工作方法
JP2022176082A (ja) 複製ビットセル基盤のmac演算装置及び方法
Krishnan et al. IMC architecture for robust DNN acceleration
CN121257628A (zh) 级联交叉阵列、数据处理方法及电子设备
CN121478714A (zh) 一种求解正则化线性回归问题的存算一体求解电路

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18817002

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2018817002

Country of ref document: EP

Effective date: 20190729

NENP Non-entry into the national phase

Ref country code: DE