EP3394799A1 - Circuit electronique, notamment apte a l'implementation de reseaux de neurones a plusieurs niveaux de precision - Google Patents
Circuit electronique, notamment apte a l'implementation de reseaux de neurones a plusieurs niveaux de precisionInfo
- Publication number
- EP3394799A1 EP3394799A1 EP16809743.4A EP16809743A EP3394799A1 EP 3394799 A1 EP3394799 A1 EP 3394799A1 EP 16809743 A EP16809743 A EP 16809743A EP 3394799 A1 EP3394799 A1 EP 3394799A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- subwords
- processors
- electronic circuit
- columns
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0495—Quantised networks; Sparse networks; Compressed networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/061—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using biological neurons, e.g. biological neurons connected to an integrated circuit
Definitions
- the present invention relates to an electronic circuit, particularly adapted to the implementation of neural networks on silicon for the processing of various signals, including multidimensional signals such as images.
- Neural networks are used in many applications, including devices, systems or methods using approaches or learning mechanisms to define the function to be performed.
- the hardware architectures of neural systems generally comprise elementary modules able to implement a set of neurons.
- a neuron of order i in a neuronal system realizes a function of the type:
- w y and E. being respectively the synaptic weights associated with the neuron and its inputs, / being a function called activation.
- Integrated circuits related to the implementation of neural networks were primarily of the ASIC type (Application-Specific Integrated Circuit). Then architectures on FPGA (Field Programmable Gate Array) were born. In first approach, we can classify neural architectures along two axes. The first axis concerns their implementation, which can be digital or analog, or even hybrid. The second axis concerns their degree of specialization vis-à-vis the neural networks implementable, the architectures can be specialized in the implementation of some types of well-defined neural networks, for example a Radial-Basis Function (RBF) network or a Kohonen map, or that can be programmable to implement a wider variety of networks.
- RBF Radial-Basis Function
- the architectures targeted by the present invention are related to circuits with digital implementation, these circuits being generic or specialized.
- a technical problem to be solved is in particular to enable the efficient realization on silicon of a complete signal processing chain, in the generic sense, by the approach of neural networks and this with several levels of precision in the coding of the data manipulated by course of treatment. This problem is divided into several problems.
- Signal processing chains of this type generally include more conventional signal processing, for example convolutions on a signal or an image, pre or post-processing.
- Conventional systems use specialized processors to perform this type of processing in addition to the actual neural architecture, resulting in a more complex and bulky system, consuming more energy.
- neural network used depends very strongly on the application, or even the input dataset.
- circuits used to implement neural networks are specialized for some types of networks. There would therefore be a definite advantage in being able to effectively implement several types of network on the same circuit in order to broaden its application domain.
- another implantation parameter that can vary is the size of the network, in particular as regards the number of inputs and the number of neurons.
- Some circuits used for neural processing are not extensible, which does not allow the implementation of neural networks whose size exceeds their material capacity.
- the invention combines mechanisms to achieve this scalability, both in cascade, or unidirectional extension, and in broadcast (broadcast) or multidirectional extension. This extensibility is also extended by mechanisms of virtualization of weights or coefficients.
- the dynamics required for the coding of the weights and inputs of a neural network is highly variable.
- the dynamics necessary for the coding of the weights can vary according to the phase in which one is in the case of an online learning. In typical cases, a 16-bit coding of the weights during the learning phase is necessary, while in the processing phase an 8-bit coding is sufficient.
- Architectures classics are worst-case, both at the level of the operators and the interconnections as at the level of the memory. To avoid this worst-case sizing, it is necessary to be able to operate with a flexible dynamic, adapted to the operating phase (allowing to code more or fewer values depending on the required accuracy).
- an object of the invention is notably to overcome these drawbacks, more particularly by allowing an efficient use of the silicon surface used to produce the calculation units on which the neural networks are implanted, while allowing calculations with several levels of precision.
- the subject of the invention is an electronic circuit capable of implementing neural networks, said circuit comprising at least:
- a transformation block connected to said calculation blocks by a communication means and able to be connected at the input of said circuit to an external data bus, said transformation block performing the transformation of the format of the input data and transmitting said data to all or part of said calculation blocks by means of K independent communication channels, an input data word being divided into subwords so that said subwords are transmitted over several successive communication cycles, a subword being transmitted by communication cycle on a communication channel dedicated to said word so that said K channels are able to transmit in parallel K words in several communication cycles.
- each calculation block comprises at least one calculation module incorporating:
- elementary processors in parallel capable of implementing each of the operations of a formal neuron;
- a memory storing said data intended for said elementary processors, organized in columns each having a width of N bits, N being greater than or equal to 1;
- a transformation module able to split or join the subwords transmitted by said transformation block into other subwords adapted to the width of said columns; a set of subwords at the output of said transformation module forming a word, the subwords of said set being distributed over one or more of said columns as a function of the coupling mode of said processors for which they are intended.
- the width of said channels is for example equal to the width of said columns, each channel having a width of N bits.
- the granularity of said elementary processors is for example equal to the width of said columns, the granularity being the maximum number of bits in parallel on any input of said elementary processors.
- a processor is temporally coupled to itself, at least two subwords which are for example intended for it are stored in the same column to be routed to said processor over several successive communication cycles.
- the subwords which are intended for them are for example stored in several columns at the same address, said subwords being routed to said processors in one or more cycles of successive communication.
- the subwords constituting the same word are for example stored at the same time on several addresses and on several columns of said memory.
- Said electronic circuit comprises for example a routing module connected between said memory and said processors, said routing module having a number of inputs at least equal to the number of columns, each input being connected to a column and only one, said module of routing being able to route the subwords to said processors.
- Said routing module is for example able to broadcast data from one column to several processors.
- Said electronic circuit comprises for example a virtualization block of the memory connected to the memories of all the blocks and to an external memory via a DMA type circuit.
- the invention also relates to a signal processing system capable of implementing neural networks, wherein said system comprises a plurality of electronic circuits such as that described above.
- FIG. 1 an example of a hardware architecture according to the invention
- FIG. 2 is a block diagram of a data transformation block located at the input / output of a circuit according to the invention
- FIG. 3 a presentation of elementary blocks composing a calculation block used in a circuit according to the invention.
- FIG. 1 presents a hardware architecture according to the invention for the implementation of a neural network but also for other signal processing applications. The invention will be described later for an application of neural networks.
- This architecture is therefore described by a circuit 10 able to implement a neural network.
- This circuit can itself be connected to other identical circuits, in cascade and / or juxtaposed, to form a complete chain of neuronal processing, including pre and post-processing of images and / or signals in general.
- This architecture is composed of one or more blocks 6 of input and output data transformation 7, of a general control block 5, of more local control blocks 3 controlling a series of calculation blocks 1 each comprising processors elementary. Each calculation block is able to implement a set of neurons. Other types of signal processing can of course be implemented in these blocks.
- the calculation blocks 1 are distributed by branches.
- a branch then comprises several calculation blocks 1, a control block 3 and a communication bus 2 shared by these blocks 1, 3.
- the branches more specifically the communication buses 2, are connected to the general control block 5 and to a transformation block 6 via an interconnection bus 4.
- the communication between the blocks is controlled by the general control block. This communication is for example asynchronous.
- the buses 2, 4 can be replaced by any other means of communication.
- the function of a block 6 of data transformation is in particular to cut an input word 7, coming for example from a system memory, into several subwords of a smaller number of bits, transmitted on the bus. interconnection 4 sequentially.
- the input word can be 32-bit coded and the subwords can be 8-bit coded. More generally, it is considered by way of example that an input word of 2 P bits is divided into 2 Q subwords of 2 P / 2 Q bits, Q being strictly less than P, with a transmission mode of 2 P / 2 bits. internal circuit 10 as described below with reference to Figure 2. It will be seen later that the input words can be encoded on a number of bits that is not a power of 2, type 2 Q . An input word can thus be formed of 10 or 12 bits for example.
- the transformation block has the resources to do the inverse transformation from the output to the input.
- the conversion block 6 advantageously makes it possible to convert the input data of the circuit to the internal precision of the most efficient architecture in terms of the area or energy consumed, or even the transfer capacity of the circuit. data, regarding the various components of the interconnection, including buses 2, 4.
- aspects related to the conversion of the input and output words associated with the internal mechanisms can optimize the circuit manufacturing in depending on the constraints related to the data, the application or the characteristics of the circuit in terms of surface area and consumption.
- Figure 2 is an internal block diagram of the transformation block 6 illustrating the splitting of the input words. Operation is described for 32-bit input words.
- the internal buses of the architecture have a width 32 bits in total for the data and 4 independent channels have been chosen. Thus, the granularity of each channel is 8 bits.
- the subwords are thus encoded on 8 bits in the transformation block. Partial parallel coding is thus used throughout the internal structure of the circuit, in particular in the various interconnections 2, 4.
- a first word M1 composed of the subwords SM1 1, SM12, SM13, SM14;
- a second word M2 composed of the subwords SM21, SM22, SM23, SM24;
- a third word M3 composed of the subwords SM31, SM32, SM33, SM34;
- a fourth word M4 composed of the subwords SM41, SM42, SM43, SM44;
- SM31 and SM41 are respectively transmitted on the first channel 21, on the second channel 22, on the third channel 23 and on the fourth channel 24.
- the subwords SM12, SM22, SM32 and SM42 are respectively transmitted on the first channel 21, on the second channel 22, on the third channel 23 and on the fourth channel 24.
- the subwords SM13, SM23, SM33 and SM43 are respectively transmitted on the first channel 21, on the second channel 22, on the third channel 23 and on the fourth channel 24.
- the subwords SM14, SM24, SM34 and SM44 are respectively transmitted on the first channel 21, on the second channel 22, on the third channel 23 and on the fourth channel 24.
- the first word M1 is transmitted on the first channel 21
- the second word M2 is transmitted on the second channel 22
- the third word M3 is transmitted on the third channel 23 and the fourth word M4 on the fourth channel 24, all on four cycles.
- To transmit a 32-bit word four communication cycles are required.
- a 2 P bit input word is split into 2 P / 2 Q subwords of 2 Q bits then It is transmitted in 2 P / 2 Q cycles in one of the communication channels 21, 22, 23, 24.
- 2 Q words can thus be transmitted in parallel in 2 Q cycles.
- P is equal to 5 and Q is 2.
- the granularity of the different channels of the interconnections of the circuit may also not be a power of 2 (ie the subwords may not be coded on 2 Q bits).
- the input words of the transformation block are not necessarily coded on a number of bits which is a power of 2.
- an input word is broken down into sub-words so that these subwords are transmitted in successive communication cycles, a sub-word being transmitted per cycle on a communication channel dedicated to this word. If the block 6 has K independent communication channels at its output, it can thus transmit K words in parallel over several successive communication cycles.
- the calculation blocks can be configured to couple their operators so as to process data of greater precision than the basic granularity of their operators.
- the characteristics mentioned above define a compromise between speed, low latency, and the desired accuracy.
- the coupling possibilities make it possible, on an optimized architecture by means of word splitting, to execute applications requiring coded data with different precisions than the nominal accuracy of the operators. This characteristic therefore makes it possible, after manufacturing the circuit, to implement applications of various domains.
- compromises for the same hardware configuration and data of the same size to be processed, it may be preferable to have low processing latency for each datum, or the possibility of processing more data at the same time in more time. Similarly, it may be preferable to reduce the accuracy of the processed data so as to process more in parallel, or vice versa. These choices of compromise are specified later.
- FIG. 3 schematically illustrates the modules making up a calculation block 1. Such a block is able to implement a set of neurons.
- a block 1 comprises a local transformation module 31, a memory 32 (for example RAM type for Random Access Memory), a routing module 33, a control module 34 and a calculation module composed of several elementary processors PE.
- Each elementary processor can realize at least one function of the type of the relation (1):
- the memory 32 stores inter alia synaptic coefficients and intermediate calculation results, all these data being intended for the elementary processors PE. This memory is therefore trivialized.
- This memory 32 is organized in columns 40 each having a width of N bits, N being greater than 1.
- the subwords constituting each stored word are distributed over one or more columns as a function of the coupling mode of the elementary processors PE. The storage and the coupling modes will be described more precisely with reference to FIG. 4.
- the data processed by block 1 are the subwords recovered on the communication bus 2 by the local transformation module 31, these subwords coming from the transformation block 6.
- This allows the data to take the least possible space in memory 32, or to be processed directly by the PE processors on the granularity best suited for the application in progress.
- this transformation block 31 advantageously makes it possible to convert the input data of the calculation block 1 to the most efficient internal precision in terms of the area or energy consumed, the memory blocks having, for example, very different characteristics depending on their form factor (width, height) for the same total capacity.
- the transformation carried out by the transformation module 31 consists of cutting or joining the subwords transmitted by the communication bus 2 into other subwords adapted to the width of the columns of the memory 32. In certain cases, this adaptation is not necessary because the subwords from the transformation block 6 may have the same width as the columns of the memory 32. This is for example the case when each channel 21, 22, 23, 24 has the same width as the columns, carrying N bits in parallel. After adaptation or not by the transformation module 31, these data are written in the memory 32.
- the association of the control module 44 and the routing module 33 allows to couple the elementary processors PE so as to make them cooperate if necessary, either temporally (a module processes the same data in several calculation cycles) or spatially (several modules process a data item in a calculation cycle, possibly over several clock cycles ), temporally and spatially at a time.
- the control module 44 controls the interconnection of the elementary processors with each other.
- FIG. 4 illustrates these different couplings. More particularly, FIG. 4 illustrates the various possible modes of execution, by presenting the different word storage modes in the memory 32 and the interactions between this memory and the calculation module 35 via the routing module 33.
- the basic granularity chosen inside the circuit defines the width of columns 40 inside the memory 32.
- the basic granularity is in particular the maximum number of bits in parallel at the input of the operators of the elementary processor PE (for example , an 8-bit adder processes 8-bit input data, etc.). More generally, the granularity is the maximum number of bits in parallel on any input of an elementary processor.
- the memory comprises eight 8-bit columns, the memory having a width of 64 bits. It would be possible to provide a 32-bit memory consisting of four columns, for example or any other combination.
- the calculation module 35 comprises the same number of elementary processors PE as columns 40 in the memory, the words or subwords of each column being directed to the processors PE by the routing 33.
- the routing unit makes it possible to associate any column to each elementary processor PE, or even to disseminate the data of a column to all (full broadcast) or part (partial broadcast) PE elementary processors. In addition, using this principle, the data of several columns can be partially broadcast to different elementary processors PE.
- the granularity of the PE processors (the number of bits that they process in parallel in a single cycle) is the same as that of the columns 40.
- the first mechanism performs a temporal coupling on each elementary processor PE.
- Each of the PE processors of the calculation block has the possibility of coupling with itself temporally so as to process in several cycles data of larger size than its original granularity. In the example of Figure 4, this granularity is equal to 8 bits.
- an elementary processor PE could process 16-bit data over two cycles, in particular for an addition. More cycles would be needed for multiplication.
- each PE processor communicates directly with a single column of the memory 32 and does not interact with its neighbors.
- the storage of a word requires several subwords 421, 422 on a single column, thus several memory addresses.
- the processor 420 performs the operations in several cycles. Data encoded on a large number of bits requires as many cycles as the granularity of the operators chosen at design is low.
- the second mechanism performs spatial coupling between the PE processors. By this mechanism, they can be coupled together so as to increase the size of the processed data without increasing the number of cycles necessary for their treatment, especially for addition. For other operations, this may not be the case.
- This spatial coupling is illustrated on the third and fourth columns of the memory 32.
- two elementary processors PE for example neighbors, are coupled in order to process words twice as wide as their granularity.
- a processed word is then composed of two subwords 43, 44 stored at the same memory address, on two columns.
- the two processors 430, 440 process 16-bit data together. It is of course possible to provide a treatment of larger words, for example 24 or 32 bits. In these cases, the word processed will be composed of three or four subwords stored at the same memory address.
- This case combines both mechanisms. In other words, it combines temporal coupling and spatial coupling. More precisely, in this case the spatial coupling is realized with a finer control of the routing module in order to maximize or optimize the use of the memory and its space. In this case, one seeks by way of example to optimize the space for the processing of four words of 24 bits. In this example four words 46, 47, 48, 49 of 24 bits are stored on only three memory addresses, for example three successive addresses. All of these words are stored on a 32-bit width.
- the first word 46 is sent directly to the processors 450, 460, 470 corresponding to the three columns on which the word 46 is stored, at the same memory address. This is done while the subword of the second word 47, stored at the same memory address on the last column is set for the second cycle, and stored in a temporary register, for example. During this second cycle, the two other subwords of the second word 47, stored at the following address, are routed in order to be processed at the same time as the first subword, two subwords of the third word 48 stored. at the same address being kept for the third cycle.
- the last subword of the third word is routed to the processors to be processed at the same time as the two subwords stored at the previous address.
- the fourth cycle no memory access takes place, the fourth word 49 being directly available.
- four words could be processed. This makes it possible both to maximize the memory space and to reduce the energy consumption linked to the reading of the memory 32, of the RAM or SRAM (Synchronous Random Access Memory) type, for example.
- FIG. 4 illustrates another significant advantage of the invention that makes multi-precision processing possible, for example 16-bit data processing by 8-bit operators by extending the processing over time or by physically coupling between the operators. , by interconnection of two neighboring operators.
- the general control block 5 has in particular the task of configuring these couplings, temporal or spatial, by temporally sending control signals to the various calculation blocks 1 to apply the sequences of operations necessary for the current or future processing, the signals order being transmitted to the internal control units 34.
- the preceding lines have described a circuit, in particular adapted to implement a neural network.
- This circuit can be advantageously used for the execution of neural networks.
- the structure of a circuit according to the invention by its various hierarchical broadcast and routing mechanisms makes it possible, at a lower cost in silicon, to produce neural networks ranging from very little connected to fully connected (called “fully-connected”).
- this structure allows scalability (routing) or diffusion (“broadcast”) between several circuits 10 of the same type, while keeping the coding partially parallel, pledge of generality in the dynamics of calculations or in other words, adaptation to all implementable dynamics. Interconnection and extensibility are facilitated by the asynchronous communication mechanism between calculation blocks 1.
- an intelligent Direct Memory Access (DMA) mechanism connects all the calculation blocks. It is thus possible to perform a data virtualization, making it possible to produce neural networks or processes on images exceeding the size of the internal memory of the circuit.
- the circuit 10 includes a memory virtualization block, not shown, connected to the memories 32 of all the blocks and to an external memory via a direct memory access circuit (DMA).
- DMA Direct Memory Access
- interconnection and virtualization mechanisms also allows effective weight sharing, very useful in the implementation of new types of neural networks. Indeed, it is possible to increase the total memory available for the architecture, either to store large amounts of input data in the signal processing mode, or to store large amounts of synaptic weights for the complex networks in the recognition mode. .
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Neurology (AREA)
- Microelectronics & Electronic Packaging (AREA)
- Logic Circuits (AREA)
- Image Processing (AREA)
- Multi Processors (AREA)
Abstract
Description
Claims
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| FR1562912A FR3045893B1 (fr) | 2015-12-21 | 2015-12-21 | Circuit electronique, notamment apte a l'implementation de reseaux de neurones a plusieurs niveaux de precision. |
| PCT/EP2016/079998 WO2017108398A1 (fr) | 2015-12-21 | 2016-12-07 | Circuit electronique, notamment apte a l'implementation de reseaux de neurones a plusieurs niveaux de precision |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| EP3394799A1 true EP3394799A1 (fr) | 2018-10-31 |
Family
ID=56068978
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| EP16809743.4A Pending EP3394799A1 (fr) | 2015-12-21 | 2016-12-07 | Circuit electronique, notamment apte a l'implementation de reseaux de neurones a plusieurs niveaux de precision |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US11308388B2 (fr) |
| EP (1) | EP3394799A1 (fr) |
| FR (1) | FR3045893B1 (fr) |
| WO (1) | WO2017108398A1 (fr) |
Families Citing this family (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108475347A (zh) * | 2017-11-30 | 2018-08-31 | 深圳市大疆创新科技有限公司 | 神经网络处理的方法、装置、加速器、系统和可移动设备 |
| CN110596668B (zh) * | 2019-09-20 | 2021-06-04 | 中国人民解放军国防科技大学 | 基于互逆深度神经网络的目标外辐射源被动定位方法 |
| FR3114422B1 (fr) | 2020-09-22 | 2023-11-10 | Commissariat Energie Atomique | Calculateur électronique de mise en œuvre d’un réseau de neurones artificiels, avec blocs de calcul de plusieurs types |
| CN117290812B (zh) * | 2023-09-14 | 2025-09-19 | 华中师范大学 | 基于行为序列编码的在线资源学习行为分析与预测方法 |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2015049183A1 (fr) * | 2013-10-04 | 2015-04-09 | Commissariat A L'energie Atomique Et Aux Energies Alternatives | Circuit electronique, notamment apte a l'implementation d'un reseau de neurones, et systeme neuronal |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9348783B2 (en) * | 2012-04-19 | 2016-05-24 | Lockheed Martin Corporation | Apparatus and method emulating a parallel interface to effect parallel data transfer from serial flash memory |
| FR3015068B1 (fr) * | 2013-12-18 | 2016-01-01 | Commissariat Energie Atomique | Module de traitement du signal, notamment pour reseau de neurones et circuit neuronal |
| US9805303B2 (en) * | 2015-05-21 | 2017-10-31 | Google Inc. | Rotating data for neural network computations |
| US10061537B2 (en) * | 2015-08-13 | 2018-08-28 | Microsoft Technology Licensing, Llc | Data reordering using buffers and memory |
-
2015
- 2015-12-21 FR FR1562912A patent/FR3045893B1/fr active Active
-
2016
- 2016-12-07 WO PCT/EP2016/079998 patent/WO2017108398A1/fr not_active Ceased
- 2016-12-07 US US15/781,680 patent/US11308388B2/en active Active
- 2016-12-07 EP EP16809743.4A patent/EP3394799A1/fr active Pending
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2015049183A1 (fr) * | 2013-10-04 | 2015-04-09 | Commissariat A L'energie Atomique Et Aux Energies Alternatives | Circuit electronique, notamment apte a l'implementation d'un reseau de neurones, et systeme neuronal |
Also Published As
| Publication number | Publication date |
|---|---|
| FR3045893A1 (fr) | 2017-06-23 |
| FR3045893B1 (fr) | 2017-12-29 |
| US11308388B2 (en) | 2022-04-19 |
| US20190005378A1 (en) | 2019-01-03 |
| WO2017108398A1 (fr) | 2017-06-29 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| EP3084588B1 (fr) | Module de traitement du signal, notamment pour reseau de neurones et circuit neuronal. | |
| EP3053108B1 (fr) | Circuit electronique, notamment apte a l'implementation d'un reseau de neurones, et systeme neuronal | |
| EP0597028B1 (fr) | Architecture de systeme en tableau de processeurs a structure parallele | |
| EP3844679B1 (fr) | Architecture de calculateur d'une couche de convolution dans un reseau de neurones convolutionnel | |
| EP0558125B1 (fr) | Processeur neuronal à cellules synaptiques reparties | |
| EP0154340A1 (fr) | Processeur de calcul d'une transformée discrète inverse du cosinus | |
| EP3660849A1 (fr) | Circuit mémoire adapté à mettre en oeuvre des opérations de calcul | |
| US10831691B1 (en) | Method for implementing processing elements in a chip card | |
| EP3394799A1 (fr) | Circuit electronique, notamment apte a l'implementation de reseaux de neurones a plusieurs niveaux de precision | |
| EP0154341B1 (fr) | Processeur de calcul d'une transformée discrète du cosinus | |
| EP0262032B1 (fr) | Additionneur binaire comportant un opérande fixé, et multiplieur binaire parallèle-série comprenant un tel additionneur | |
| EP0319421B1 (fr) | Comparateur binaire et opérateur de tri de nombres binaires | |
| EP0259231B1 (fr) | Dispositif de détermination de la transformée numérique d'un signal | |
| EP3803574A1 (fr) | Circuit de génération de facteurs de rotation pour processeur ntt | |
| FR2568036A1 (fr) | Circuit de calcul | |
| FR2667176A1 (fr) | Procede et circuit de codage d'un signal numerique pour determiner le produit scalaire de deux vecteurs et traitement tcd correspondant. | |
| FR3133936A1 (fr) | Procédé de traitement dans un accélérateur de réseau de neurones convolutifs et accélérateur associé | |
| EP2553655B1 (fr) | Architecture de traitement d'un flux de données permettant l'extension d'un masque de voisinage | |
| FR2716321A1 (fr) | Procédé et dispositif de quantification vectorielle d'un signal numérique, notamment appliqué à la compression d'images numériques. | |
| EP0329572B1 (fr) | Multiplieur de nombres binaires à très grand nombre de bits | |
| EP1052575A1 (fr) | Système de transmission, récepteur et réseau d'interconnexion | |
| FR3056320A1 (fr) | Procede de calcul par au moins un ordinateur d’au moins une operation d’algebre lineaire sur au moins une matrice | |
| FR2655444A1 (fr) | Reseau neuronal a circuits neuronaux electroniques a apprentissage de coefficients, et procede d'apprentissage. | |
| EP0606458A1 (fr) | Dispositif electronique pour l'analyse d'image et la vision artificielle | |
| FR2741466A1 (fr) | Dispositif de filtrage de messages incidents dans un controleur de noeud de reseau informatique |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: UNKNOWN |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
| PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
| 17P | Request for examination filed |
Effective date: 20180625 |
|
| AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
| AX | Request for extension of the european patent |
Extension state: BA ME |
|
| DAV | Request for validation of the european patent (deleted) | ||
| DAX | Request for extension of the european patent (deleted) | ||
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
| 17Q | First examination report despatched |
Effective date: 20210701 |
|
| RAP3 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: COMMISSARIAT A L'ENERGIE ATOMIQUE ET AUX ENERGIESALTERNATIVES |
|
| GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: GRANT OF PATENT IS INTENDED |
|
| RIC1 | Information provided on ipc code assigned before grant |
Ipc: G06N 3/063 20060101AFI20250926BHEP Ipc: G06N 3/0464 20230101ALI20250926BHEP Ipc: G06N 3/0495 20230101ALI20250926BHEP |
|
| INTG | Intention to grant announced |
Effective date: 20251103 |