WO2021049005A1 - 情報処理装置およびそれを備えた電子機器 - Google Patents
情報処理装置およびそれを備えた電子機器 Download PDFInfo
- Publication number
- WO2021049005A1 WO2021049005A1 PCT/JP2019/036101 JP2019036101W WO2021049005A1 WO 2021049005 A1 WO2021049005 A1 WO 2021049005A1 JP 2019036101 W JP2019036101 W JP 2019036101W WO 2021049005 A1 WO2021049005 A1 WO 2021049005A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- layer
- signal
- amplitude
- fourier transform
- information processing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/14—Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
- G06F17/141—Discrete Fourier transforms
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0475—Generative networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0495—Quantised networks; Sparse networks; Compressed networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/0895—Weakly supervised learning, e.g. semi-supervised or self-supervised learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/092—Reinforcement learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/094—Adversarial learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/096—Transfer learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/42—Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
- G06V10/431—Frequency domain transformation; Autocorrelation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/778—Active pattern-learning, e.g. online learning of image or video features
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
Definitions
- the present application relates to an information processing device and an electronic device equipped with the information processing device.
- Neural networks used for image recognition, etc. train images by deep learning and extract features in the images.
- CNN Convolutional Neural Network
- This CNN convolves the weight function learned by the error back propagation method with the input image, updates the weight function by the error back propagation method, and associates the input image with the output result.
- the input image is expressed as a set of points in a high-dimensional space
- the input image is perturbed using a matrix obtained by smoothing the weighting function
- the perturbed input image is subjected to a discrete Fourier transform or discrete.
- a method is known in which a cosine transform is performed, the image is projected onto a subspace, and the transformed image is inversely transformed to obtain an image after perturbation (for example, Patent Document 1).
- the recognition accuracy can be improved by forming the convolutional layer into a multi-layer structure.
- the number of matrix operations to be calculated in the convolution layer increases in proportion to the total number of pixels of the image for a large image such as an image of 2048 pixels ⁇ 2048 pixels, so that the amount of calculation is enormous. become. Therefore, the required calculation cost is high, and it is difficult to simplify the information processing apparatus.
- the conventional calculation of each layer in CNN is imitated by incorporating the structure of the brain into a mathematical model, but it is difficult to grasp the physical meaning.
- the present application discloses a technique for solving the above-mentioned problems, reduces the amount of calculation in the neural network, obtains high recognition accuracy by the neural network with a low-cost and simple device configuration, and obtains high recognition accuracy by the neural network.
- An object of the present invention is to provide an information processing apparatus capable of detailed analysis of a neural network. Furthermore, it is an object of the present invention to provide an electronic device that performs high-speed and high-precision control operation based on learning and inference using this information processing device.
- the information processing apparatus disclosed in the present application processes an input signal with a neural network, and has a Fourier transform layer that Fourier transforms the input signal and outputs a first amplitude signal and a first phase signal, and a matrix by training.
- the first weight matrix that updates the value in the matrix is applied to the first amplitude signal to output the second amplitude signal
- the second weight matrix that updates the value in the matrix by training is applied to the first phase signal.
- a phase coupling layer that outputs a two-phase signal and a complex activation function f that is an activation function in the spatial frequency region, at least the second amplitude signal among the second amplitude signal and the second phase signal is generated. It includes a complex activation layer to be updated, and an inverse Fourier transform layer that performs inverse Fourier transformation by combining the second amplitude signal and the second phase signal updated by the complex activation layer.
- the electronic device disclosed in the present application performs a control operation using the above information processing device.
- the amount of calculation in the neural network can be reduced, high recognition accuracy by the neural network can be obtained with a low-cost and simple apparatus configuration, and detailed analysis of the neural network becomes possible. Become. Further, according to the electronic device disclosed in the present application, high-speed and high-precision control operation becomes possible.
- FIG. It is a figure which shows the configuration example of the hardware by Embodiment 1.
- FIG. It is a figure which shows the structural example of CNN as a 1st comparative example. It is a figure which shows the structural example of the neural network by Embodiment 1.
- FIG. It is a partial detailed view of FIG. It is a figure explaining the processing of the input image in the Fourier transform by Embodiment 1.
- FIG. It is a figure explaining the operation of updating the amplitude value by using the complex activation function f according to Embodiment 1.
- FIG. It is a figure explaining the example of the complex activation function f by Embodiment 1.
- FIG. It is a figure explaining another example of the complex activation function f according to Embodiment 1.
- FIG. It is a figure explaining another example of the complex activation function f according to Embodiment 1.
- FIG. It is a figure explaining another example of the complex activation function f according to Embodiment 1.
- FIG. It is a figure which shows the structural example of the neural network by another example of Embodiment 1.
- FIG. It is a figure which shows the accuracy of the neural network by Embodiment 1.
- FIG. It is a figure explaining the speed-up effect of the neural network by Embodiment 1.
- FIG. It is a figure explaining the operation using the complex activation function f by Embodiment 3.
- FIG. It is a figure which shows the partial structure example of the neural network by Embodiment 5.
- FIG. 7 It is a figure which shows the partial structure example of the neural network by Embodiment 7. It is a figure which shows the partial structure example of the neural network by Embodiment 8. It is a figure which shows the structural example of the neural network by Embodiment 9.
- FIG. It is a figure which shows the structural example of the neural network by Embodiment 10. It is a figure which shows the configuration example of the air conditioner according to Embodiment 11. It is a figure which shows the structural example of the electronic device by another example of Embodiment 11.
- FIG. 1 is a diagram showing an overall configuration of hardware 100 as an information processing device that functions as a neural network (hereinafter referred to as NN) according to the first embodiment of the present application.
- the hardware 100 may be a stand-alone computer, a server of a server client system using a cloud or the like, or a client. Further, the hardware 100 may be a smartphone or a microcomputer. Further, when assuming the inside of a factory, a computer environment in a closed network in the factory called edge computing may be used.
- the hardware 100 has a built-in CPU (Central Processing Unit) 30, and an input / output interface 35 is connected to the CPU 30 via a bus wiring 34.
- the CPU 30 executes a program stored in the ROM (Read Only Memory) 31 accordingly. Execute. Alternatively, the CPU 30 loads the program stored in the hard disk (HDD) 33 or SSD (Solid State Drive, not shown) into the RAM (Random Access Memory) 32, and reads / writes / executes it as necessary. ..
- the CPU 30 performs various processes to make the hardware 100 function as a device having a predetermined function.
- the CPU 30 outputs the results of various processes from the output device which is the output unit 36, transmits them from the communication device which is the communication unit 38, and further records them on the hard disk 33 via the input / output interface 35, if necessary. Further, the CPU 30 receives various information from the communication unit 38 via the input / output interface 35 as needed, and calls the information from the hard disk 33 for use.
- the input unit 37 is composed of a keyboard, a mouse, a microphone, a camera, and the like.
- the output unit 36 is composed of an LCD (Liquid Crystal Display), a speaker, or the like.
- the program executed by the CPU 30 can be pre-recorded on the hard disk 33 or the ROM 31 as a recording medium built in the hardware 100. Alternatively, the program can be stored (recorded) in the removable recording medium 40 connected via the drive 39.
- Such a removable recording medium 40 can be provided as so-called package software.
- the removable recording medium 40 include a flexible disc, a CD-ROM (Compact Disk Ready Memory), an MO (Magnet Optical) disc, a DVD (Digital entirely Disc), a magnetic disc, and a semiconductor memory.
- the program can be transmitted and received through a system (Comport) such as WWW (World Wide Web) that connects a plurality of hardwares via either one or both of wired and wireless.
- a system such as WWW (World Wide Web) that connects a plurality of hardwares via either one or both of wired and wireless.
- the training described later can be performed, and only the weighting function obtained by the training can be transmitted / received by the above method.
- the CPU 30 causes the hardware 100 to function as an information processing device that processes each layer constituting the NN and generates the NN.
- the hardware 100 functions as an NN and also functions as an NN generator.
- Each layer of the NN is composed of general-purpose hardware such as a CPU or GPU (Graphics Processing Unit) that specializes in parallel computing, FPGA (Field-Programmable Gate Array), FFT (Fast Fourier Transform) computing architecture, and FFT (Fast Fourier Transform) computing architecture.
- a CPU or GPU Graphics Processing Unit
- FFT Field-Programmable Gate Array
- FFT Fast Fourier Transform
- FFT Fast Fourier Transform
- the matrix operations such as the Fourier transform and the inverse Fourier transform, which will be described later, may be performed by dedicated hardware.
- hardware for fast Fourier transform and fast inverse Fourier transform capable of performing Fourier transform and inverse Fourier transform at high speed can be obtained as general-purpose hardware. Some of this dedicated hardware is built into the CPU or GPU.
- a sensor for the purpose of converting physical phenomena such as electromagnetic waves including sound waves or visible light, heat, or vibration into numerical data, or a mechanism for outputting an image or calculation result such as CAD designed in the hardware 100.
- the hardware 100 may be provided.
- the hardware 100 may include a mechanism for fusing the information of the sensor and the calculation result of the hardware 100.
- the hardware 100 includes a mechanism for driving the power line or the internal battery.
- the hardware 100 may be configured by a plurality of units via a communication port, and the training and inference described later may be carried out by the hardware 100 having a different configuration. Further, the hardware 100 may receive sensor signals connected to different hardware 100 via the communication port, or may receive a plurality of sensor signals via the communication port. Further, a plurality of virtual hardware environments may be prepared in one hardware 100, and each virtual hardware may be treated as individual hardware.
- NN ⁇ Data used for learning>
- the data used in NN is supervised learning, unsupervised learning, or reinforcement learning.
- NN is sometimes called deep learning or perceptron.
- the perceptron when the hidden layer is one layer, which will be described later, it is called a single perceptron, and when there are two or more layers, it is called a multi-layer perceptron. This multi-layer perceptron is called NN.
- Supervised learning is a method of learning the correct answer label associated with the learning data with respect to the learning data.
- the input signal and the output signal are associated with each other.
- unsupervised learning is a method of learning without attaching a correct answer label to the learning data.
- Known examples include a Stacked Autoencoder (SAE) in which a self-encoder is multi-layered, and a Deep Boltzmann Machine (DBM) in which a constrained Boltzmann machine is multi-layered.
- SAE Stacked Autoencoder
- DBM Deep Boltzmann Machine
- reinforcement learning may use DQN (Deep Q Learning), which maximizes the expected value obtained in the future, for data that changes from moment to moment instead of giving the correct answer.
- DQN Deep Boltzmann Machine
- unsupervised learning When using unsupervised learning, perform multivariate analysis such as data clustering and principal component analysis. Further, since an image having two or more dimensions is generally used as the input image, reinforcement learning can also be used. In addition, semi-supervised learning in which some training data is not given a correct answer label, or transfer learning data in which a model trained using the training data is applied to another training data is also conventional CNN. Is also applicable to this embodiment as long as is applicable.
- learning includes batch learning in which data to be learned is processed in a batch, and online learning in which the data is added and learned each time the data to be learned is input.
- batch learning in which data to be learned is processed in a batch
- online learning in which the data is added and learned each time the data to be learned is input.
- FIG. 2 is a diagram showing a configuration example of CNN as a first comparative example.
- the CNNs are, in order from the input layer 1, a convolutional layer 2, an activation function 3, a convolutional layer 2, an activation function 3, a pooling layer 4, a convolutional layer 2, an activation function 3, and a fully connected layer. 5. It is configured to include an activation function 3, a fully connected layer 5, an output layer 6, and an output 7.
- the CNN performs learning by using the convolution layer 2 and the activation function 3, and the convolution layer 2 and the activation function 3 are used with respect to the input layer 1 whose input signal is an image which is training data.
- the convolution layer 2, the activation function 3, the pooling layer 4, and the fully connected layer 5 are combined a plurality of times. Then, with respect to the output from this combination, a desired output signal (output 7) is obtained through the output layer 6 including an error function such as a softmax function.
- the layer between the input layer 1 and the output layer 6 is called a hidden layer.
- the NN has two or more hidden layers, and the NN is formed by appropriately combining a plurality of layers including an input layer and an output layer.
- the layer on the input side is also referred to as a lower layer and the layer on the output side is also referred to as an upper layer when viewed from a certain layer.
- a CNN having more than 150 hidden layers is also known, and in recent years, there is a tendency to increase the number of layers in order to improve the accuracy.
- the error back propagation method is one method for efficiently calculating the derivative for expressing the propagation of the error from the output layer 6 to the input layer 1.
- the convolution layer 2 applies a kernel (or a filter) to the map and performs a convolution operation.
- the kernel is composed of, for example, a 7 pixel ⁇ 7 pixel image, that is, a 7 ⁇ 7 matrix, that is, 49 elements.
- This kernel is applied to the data of the input layer 1 or the map of the lower layer to be combined to perform the convolution operation, and the signal is output to the upper layer.
- the convolution operation has the function of extracting the characteristic edge of the image indicated by the input image from the image by the kernel.
- the edge is a boundary between an object and an object, an object and a background image, and the like.
- the process of preparing a plurality of these kernels and updating the matrices constituting the kernel by the above error back propagation method is a process called learning.
- xij be the value of the position (i, j) in the input image.
- i and j are integers, and here they are integers of 0 or more. In this case, each pixel value is an integer of 0 or more.
- the kernel be an image smaller than the input image and its size be H ⁇ H. H is an integer greater than or equal to 2.
- hpq At this time, it is assumed that hpq takes an arbitrary real value. Under such conditions, the convolution operation is expressed by the following equation (1). Note that uij is a value in the output image obtained by the convolution operation.
- the convolution operation in deep learning is generally defined by the following equation (2). This is called cross-correlation, or correlation, except for deep learning. Since the correlation in this case gives the same result as the convolution operation when the kernel is inverted vertically and horizontally, the operation represented by the equation (2) is referred to as a convolution operation here.
- the difference between the convolution operation and the correlation corresponds to the complex conjugate of the Fourier transform in one of xij and hpq.
- the kernel consists of values obtained by learning, it is not necessary to use the complex conjugate when making the initial value of the kernel random, but when changing the weight of the initial value, use the complex conjugate for the kernel. Is desirable.
- stride is sometimes used. Assuming that the stride width is an integer s of 2 or more, the stride can be expressed by the following equation (3).
- the convolution operation the calculation is performed while sliding the kernel with respect to the input image, and the sum is taken.
- the size of the image (input image) of the lower layer is W ⁇ H and the size of the kernel is U ⁇ V
- the size of the output image is (W-2 [U / 2]]. )
- [] is an operator that rounds down to the nearest whole number. Since [U / 2] and [V / 2] are natural numbers, when the convolution operation is performed, the image after the operation becomes smaller than the input image before the operation.
- the width [U / 2] and the width [V] are outside the image after the convolution operation.
- / 2] image is added to enlarge the output image.
- This method is called padding.
- the image to be added to the outside may be any image, but it is often set to 0, and is particularly called zero padding.
- the size of the matrix needs to be a power of 2, so it is desirable to make the size of the input matrix and the output matrix equal by padding. ..
- the activation function 3 which is a non-linear function, is applied to the output signal after the convolution operation in the convolution layer 2. It is important for the NN (CNN in this case) that the activation function 3 is non-linear, and the inclusion of the non-linear function makes it possible to express arbitrary data that could not be expressed by the linear function alone.
- a linear function h (x) cx using the constant c as an activation function.
- y c ⁇ c ⁇ c ⁇ x
- y c 3 x
- two or more hidden layers using a nonlinear function as an activation function are required, so the identity function will not be described here.
- the weight and bias values of the CNN composed of the convolutional layer 2 and the activation function 3 are updated by the error back propagation method. This update is called the learning process, or simply learning. In the early stages of learning, pseudo-random numbers are used for weights and biases. In addition, the initial value of Xavier or the initial value of He, which will be described later, may be used. When the training process is complete, the weights and biases with updated values are output.
- inference process or simply inference.
- the learning process is done on a GPU or a server client system with dedicated hardware.
- inference does not require a learning process to reduce the error from the correct label, and only calculates and outputs from the input layer 1 to the output layer 6 using the above weights and biases.
- the amount of calculation is small.
- the amount of calculation of the convolution layer 2 is large, and the calculation time is required.
- the inference process often requires a response speed within a few seconds, but in a general CNN where the number of layers of the convolution layer 2 is increased to improve the learning accuracy, the desired time is required for simple hardware such as a microcomputer. It is often difficult to respond within.
- Edge computers that have only computers with relatively low computing power, such as smartphones, may be equipped with dedicated hardware for CNN, but the hardware mounting area will increase, the processing system will become more complicated, or the amount of power used will increase. Becomes a problem.
- the output signal of the sensor that converts a physical phenomenon such as electromagnetic waves including sound waves or visible light, heat, or vibration into numerical data, a signal designed by calculation in the hardware 100, or a sensor signal is calculated.
- a configuration example of hardware 100 (information processing apparatus) that processes a signal including both results is shown.
- the calculation in the hardware 100 is a calculation process using NN.
- the input signal may be any one as long as it is one-dimensional or more, but in this embodiment, a two-dimensional image will be described.
- the operation corresponding to the convolution operation is performed in the spatial frequency domain, and the complex activation function, which is the activation function in the spatial frequency domain, is applied to the signal after the calculation. That is, the input signal in the spatial domain such as the output signal of the sensor described above is Fourier transformed to convert it into a signal in the spatial frequency domain, and in the spatial frequency domain, an operation corresponding to a convolution operation and a complex activation function are used. Performs the calculation that was performed. Then, the signal in the spatial frequency domain is inverse Fourier transformed and returned to the signal in the spatial domain to obtain a desired output signal.
- the convolution operation in the CNN shown in the first comparative example above requires a plurality of matrix operations for one convolution operation, but one convolution operation in this spatial domain is once in the spatial frequency domain. Is replaced by the matrix operation of.
- the operation corresponding to the convolution operation in the spatial frequency domain will be referred to as a complex convolution operation for convenience.
- a complex activation function that is a non-linear function is applied to the signal after the calculation by the complex convolution operation. In this way, the complex convolution operation and the operation using the complex activation function are performed in the processing in the continuous spatial frequency domain. As a result, the above-mentioned effect that the number of operations can be significantly reduced can be surely realized without being hindered.
- the complex activation function is used after the complex convolution operation, if the Fourier transform and the inverse Fourier transform in the hidden layer are performed once each, the processing in the spatial frequency domain continuous between them can be performed.
- the complex convolution operation and the operation using the complex activation function can be performed as many times as necessary. As a result, a significant reduction in calculation cost can be achieved.
- FIG. 3 is a diagram showing a configuration example of the simplest NN according to this embodiment.
- the NN is composed of the Fourier transform layer 12, the amplitude coupling layer 13A, and the phase coupling layer 13B in this order from the input layer 11, and uses the coupling layer 13 that performs the complex convolution operation and the complex activation function f. It is configured to include a complex activation layer 14, an inverse Fourier transform layer 15, an output layer 16 and an output 17 for performing operations.
- FIG. 4 is a partial detailed view of FIG. 3, and also shows signals generated in each layer.
- This NN uses a two-dimensional image as an input signal 20, and the input layer 11 inputs the input signal 20 to the NN.
- the Fourier transform layer 12 Fourier transforms the input signal 20 in the spatial domain and outputs the first amplitude signal 21r and the first phase signal 21 ⁇ , which are signals in the spatial frequency domain. In this case, the Fast Fourier Transform is used.
- the amplitude coupling layer 13A has a first weight matrix W1 that updates the values in the matrix by training, and outputs the second amplitude signal 22r by applying the first weight matrix W1 to the first amplitude signal 21r.
- the phase coupling layer 13B has a second weight matrix W2 that updates the values in the matrix by training, and outputs the second phase signal 22 ⁇ by applying the second weight matrix W2 to the first phase signal 21 ⁇ .
- the complex activation layer 14 updates at least the second amplitude signal 22r of the second amplitude signal 22r and the second phase signal 22 ⁇ by using the complex activation function f in the spatial frequency domain. In this embodiment, only the second amplitude signal 22r is updated and output as the third amplitude signal 23r using the second phase signal 22 ⁇ , and the second phase signal 22 ⁇ is output without being updated.
- the inverse Fourier transform layer 15 combines the third amplitude signal 23r and the second phase signal 22 ⁇ to perform an inverse Fourier transform to generate the signal 25 in the spatial region.
- the fast inverse Fourier transform is used.
- the output layer 16 converts the signal 25 from the inverse Fourier transform layer 15 into a desired shape, obtains an output 7, and outputs the output 7 from the NN.
- a coupling layer 13 that performs a complex convolution operation and a complex activation layer 14 that performs an operation using the complex activation function f are provided.
- the number of rows of the first weight matrix W1 is the same as the number of rows of the amplitude matrix which is the first amplitude signal 21r
- the number of rows of the second weight matrix W2 is the phase which is the first phase signal 21 ⁇ . It is the same as the number of columns in the matrix. Further, there is no limitation on the number of columns of the first weight matrix W1 and the second weight matrix W2.
- the amplitude matrix and the phase matrix need to be a matrix of powers of 2. Therefore, in the amplitude coupling layer 13A and the phase coupling layer 13B, a first weight matrix W1 and a second weight matrix W2 that output a matrix having the same magnitude as the amplitude matrix and the phase signal are used.
- the input data includes a sensor signal that receives electromagnetic waves including sound waves or visible light, a sensor signal that acquires heat or vibration, a signal that is calculated and output in the hardware 100, or both the sensor signal and the calculation result. Use a signal that is a fusion of.
- the signal received by the microphone or ultrasonic sensor is used.
- Sensors that collect electromagnetic waves include cameras that collect visible light, cameras that collect infrared or ultraviolet light, light intensity sensors, near-field antennas, far-field antennas, magnetic sensors, electric field / magnetic field sensors, current sensors, voltage sensors, or radiation sensors.
- an acceleration sensor a temperature sensor, a humidity sensor, a gas sensor, a distance sensor, a pressure sensor, an acceleration sensor, or a vibration sensor such as a gyro may be used.
- the sensor signal does not necessarily have to acquire all the data at the same time, and may be treated as one input data by performing processing after acquiring the data at any time. Further, it may be a sensor that senses by contact or a sensor that senses by non-contact. Further, a signal obtained by combining a signal of a visible light camera and a signal of an infrared camera or the like may be used as input data.
- an active phased array antenna or a device that measures a distant wind condition using an active phased array antenna or a laser beam can scan electromagnetic waves or winds in space at high speed to obtain input data as a two-dimensional or higher-dimensional image.
- the data to be the input data does not have to be a single unit, and two or more data may be used in combination. In that case, desired learning can be performed by learning in combination with the NN using the complex activation function shown in this embodiment and the conventional perceptron.
- the hardware 100 processes, for example, an input signal relating to an object obtained from a camera that captures images with visible light, and outputs an output signal from the output layer.
- the output signal is the result of a classification of the input signal, a regression-analyzed estimator, clustering, or a multivariate analysis.
- the type of output signal is determined by the data (teacher data) with the correct label associated with the input signal.
- the data associated with the input image and the correct answer label in a one-to-one correspondence at the time of learning is learned as teacher data.
- the parameters obtained by learning each element of the weight matrix constituting the layer of NN
- the signal captured by the camera and the learned parameters are calculated, and the output signal for the classification is obtained.
- the safety of an image scene is classified into analog signals from 0 to 100 points.
- one real number from 0 to 100 points is given as a correct label for one image.
- a plurality of combinations of the image and the correct label are prepared and used as learning data.
- the parameters of the layers in the NN are determined.
- a regression signal is obtained as an output signal.
- a camera that shoots in a frequency band that is visible light is used as a sensor, but even in a sensor such as an antenna that receives infrared rays, ultraviolet rays, or electromagnetic waves having a frequency lower than that of infrared rays, the sensor signal is If a correct label such as classification or regression is given, the same processing can be performed. Further, in the above explanation, supervised learning has been described, but the same applies to unsupervised learning without a correct answer label.
- the self-encoder is an information processing device configured to perform unsupervised learning by NN.
- the purpose of the self-encoder is to use the input as data at the time of learning and to extract the features representing the data, that is, each element of the matrix in the layer of NN. Therefore, the NN output layer aims to output the same signal as the input layer, and reduces the size of the matrix of the hidden layer between the input layer and the output layer. In this way, if the results of the input layer and the output layer can be made equal even through a small matrix, the file size of the input signal can be reduced. This is called dimensional compression. Also in this self-encoder, the image captured by the camera is used as the input signal in the same manner as described above. At this time, the correct label is unnecessary, and learning is performed by inputting the output of the camera to the self-encoder.
- a computer-aided design tool CAD
- a simulator of electromagnetic field, heat or stress, or other environment such as a computer game
- the input signal in which both the sensor signal and the calculation result are fused the signal obtained by inputting the sensor signal into the simulator or the like is used as the input signal.
- a signal obtained by appropriately changing the type or position information of the sensor based on the output of the simulator may be used as an input signal.
- the physical signal obtained by manufacturing or simulating based on the CAD data is used as the correct label, or the correct label is artificially changed.
- Physical signals calculated based on CAD data include, for example, electromagnetic field distribution due to electromagnetic waves, current, voltage, heat, and the like.
- the input of the electromagnetic field simulation is CAD data (2D or 3D image)
- the correct label is the S parameter (Scatting parameters) that is the simulation result, and the amplitude value of the electric / magnetic field at a certain position in space. , Voltage / current, or pointing vector.
- the above-mentioned CAD and simulator input signals have one-dimensional or higher input images. Even if it is one-dimensional data such as time-series data, it can be two-dimensional by using a spectrogram in which the horizontal axis is time, the vertical axis is frequency, and the amplitude at each time / frequency is the amplitude of the output of Fourier transform. It can also be treated as data. As the Fourier transform when creating the spectrogram in this way, a short-time Fourier transform that performs the Fourier transform every hour is used. In this embodiment, two-dimensional data will be used for the sake of simplicity.
- a grayscale image is assumed, but in the case of a color image, in the case of RGB, the input image is separated into Red, Green, and Blue to make an image one-dimensionally higher than the input image. To do. This dimension is called a channel, and in the case of RGB, there are 3 channels. In the case of CMYK, there are four channels: Cyan, Magenta, Yellow, and Key plate.
- a plurality of channels are input, generally, 4 channels are converted into 1 channel by a convolution operation using a kernel.
- a method of providing one convolution layer in front of the Fourier transform layer 12, a method of performing a Fourier transform on each channel and converting it into one channel by a fully connected layer, or simply pre-weighting each channel. , And the method of converting the input signal 20 to be input to the input layer 11 into one channel can be used.
- the input layer 11 stores the input data which is the input signal 20 to the NN and passes it to the upper layer of the NN.
- the learning result using MNIST is shown.
- the MNIST is a grayscale image having a length ⁇ width of 32 ⁇ 32, and has 60,000 training data and 10000 test data not used for training.
- ⁇ Fourier transform layer, inverse Fourier transform layer> The Fourier transform layer 12 that performs the Fourier transform will be described below. Since the inverse Fourier transform is the inverse transform of the Fourier transform, the details of the inverse Fourier transform and the inverse Fourier transform layer 15 will be omitted. In the Fourier transform, due to its nature, the transformation is performed on the premise that the two-dimensional image input to the Fourier transform is a two-dimensional plane infinitely connected vertically and horizontally. When the input images are directly connected vertically and horizontally, the images at the edges of the images become discontinuous on the connecting line, and frequency components that the original input image does not have may occur. Therefore, in a normal Fourier transform, a window function is applied to each of the vertical and horizontal directions of the image, and the signal whose end is close to 0 is Fourier transformed.
- a method is used in which an image symmetrical in the vertical, horizontal, and diagonal directions of one image is arranged with respect to the input image before the Fourier transform. Assuming that the horizontal axis of the input image is the x-axis and the vertical axis is the y-axis, an image that is line-symmetrical with the boundary line ly parallel to the y-axis is placed at the end of the x-axis component of the input image, and the y of the input image.
- An image line-symmetrical with the boundary line lp parallel to the x-axis is placed at the end of the axis component. Further, at a position diagonal to the input image, a point-symmetrical image is arranged with respect to the intersection of the two boundary lines lp and ly, that is, a rotationally symmetric image rotated by 180 degrees.
- the original input image does not have without using the window function. It is possible to suppress the generation of frequency components and perform calculations.
- the fast Fourier transform is used for the Fourier transform
- the vertical and horizontal dimensions of the input image are powers of 2, that is, even numbers.
- the vertical and horizontal directions of the image composed of four images are even, and since they have a power of 2 magnitude, a fast Fourier transform is used. be able to.
- the input image becomes large, so that the processing of the fast Fourier transform becomes large. Therefore, it is not applied when there is no information at the edge of the image such as an MNIST image and no discontinuity occurs even if the images are connected vertically and horizontally as they are. Further, in the inverse Fourier transform, the same processing may be performed before the inverse Fourier transform.
- the Fourier transform is a transformation from a signal in the spatial domain to a signal in the spatial frequency domain.
- the signal in the spatial domain is called a spatial signal
- the signal in the spatial frequency domain after Fourier transform is called a spatial frequency signal.
- the Fourier transform of the signal a (s, t) in the spatial region having a magnitude of m ⁇ n is expressed by the following equation (4). This equation is called the discrete Fourier transform.
- j is a complex number
- e is a Napier number
- ⁇ is a pi.
- a fast Fourier transform is used for the Fourier transform. Since the FFT with the radix 2 divides the signal at the N point in half, it can be considered as a Fourier transform with the radix 2 time-decimated. Assuming that n is the size of a two-dimensional image, the calculation order of the normal Fourier transform is O (n 3 ), and the calculation order of the convolution operation is also O (n 3 ). Note that O () indicates an approximate value of the number of calculations. The calculation order of the fast Fourier transform is O (n 2 ⁇ log 2 n), and the calculation order of the combination of the fast Fourier transform and the fast inverse Fourier transform (IFFT) is O (2 n 2 ⁇ log 2 n).
- IFFT fast inverse Fourier transform
- the calculation order of the above-mentioned complex convolution operation which is the operation in the coupling layer 13, is O (n 2 ), which is sufficient as compared with other calculation orders O (n 3 ) and O (n 2 ⁇ log 2 n). It is small and can be ignored.
- the NN calculation order according to this embodiment is O (2n 2 ⁇ log 2 n), and the amount of calculation can be reduced and the speed can be increased.
- the calculation order of the CNN is O (m ⁇ n 3 ).
- the complex convolution operation is performed instead of the convolution operation, but since the Fourier transform and the inverse Fourier transform are repeatedly used before and after each complex convolution operation, the fast Fourier transform and the high-speed Fourier transform are performed.
- the calculation order is O (2 m ⁇ n 2 ⁇ log 2 n).
- the number of complex convolution operations in the second comparative example is equal to the number ⁇ of the convolution layer 2 in the CNN.
- the calculation order of NN according to this embodiment is O (2n 2 ⁇ log 2 n) regardless of the number of complex convolution operations corresponding to the number of coupling layers 13, whereas CNN, which is the first comparative example.
- the amount of calculation sharply increases as the number of times m of the convolution operation or the complex convolution operation increases.
- the calculation amount is reduced by 10 times as compared with the second comparative example, and the calculation amount is reduced by about three digits as compared with the CNN of the first comparative example using the convolution operation.
- the amount of calculation is directly linked to the calculation time, it means that, for example, a calculation that takes one month on CNN can be calculated in about 90 minutes on NN according to this embodiment.
- the larger the number of complex convolution operations corresponding to the convolution operations and the larger the image the greater the effect of reducing the amount of calculation.
- the convolution layer 2 is often used by stacking it several times to several tens of times, and by using the NN of this embodiment, the amount of calculation can be significantly reduced and the speed can be increased.
- some smartphones in recent years have 50 million pixels or more (that is, output a matrix of 7,000 ⁇ 7,000) or more, and some cameras have 100 million pixels (output a matrix of 10,000 ⁇ 10,000) or more. It was difficult to perform the CNN convolution operation on such a large image without reducing the number of pixels, that is, without degrading the information contained in the image, considering the amount of calculation. It becomes possible by using NN by.
- the amount of calculation can be reduced and the calculation can be performed at high speed, and the reliability of the calculation can be improved.
- the fast Fourier transform uses only the power of 2 signal, and in the case of a two-dimensional image, only the image obtained by multiplying the power of 2 by 2 is calculated at high speed by using, for example, a butterfly operation. Since a general signal is not a matrix of powers of 2, the input signal is padded with zeros so that it has a power of 2. This allows the Fast Fourier Transform to be used for all signals. The same applies to the fast inverse Fourier transform.
- a frequency shift operation is performed in which the order of the images is changed so that the low frequency components are concentrated in the center of the data.
- the low frequency component gathers in the central part of the matrix and the high frequency component gathers in the peripheral part.
- Such a frequency shift operation is often used because many low-frequency signals contain features.
- the processing of the edge portion of the image causes discontinuity, which easily leads to an error cause, so that the frequency shift calculation is more effective.
- the frequency shift operation has no effect on the result.
- the Fourier transform in the Fourier transform layer 12 can be separated into an amplitude and a phase, or a combination of a real part and an imaginary part.
- the first amplitude signal 21r and the first phase signal 21 ⁇ are generated.
- the amplitude becomes a real number of 0 or more.
- the phase should be a real number greater than or equal to ⁇ and less than ⁇ .
- the above-mentioned padding is performed so that the input signal and the output signal of the Fourier transform layer 12 are matrices of the same magnitude, and usually zero padding is performed.
- the coupling layer 13 is used in the spatial frequency domain in this embodiment.
- the bonding layer 13 a fully bonded layer or a loosely bonded layer is used.
- overfitting can be prevented by using the loosely coupled layer in the fully connected layer and the layer close to the output layer 16 of the upper layer.
- the matrix constituting the fully connected layer all the elements of the weight matrix are updated, but in the matrix constituting the loosely coupled layer, the elements are not stochastically updated.
- the bias vector may be a zero vector at a location close to the output layer 16. Pseudo-random values are usually used as the initial values of W and b. Further, it is known that a matrix called an initial value of Xavier or an initial value of He may be used as an initial value, and learning proceeds quickly. This is the same as the case of the spatial signal, and the description thereof will be omitted.
- the coupling layer 13 is composed of an amplitude coupling layer 13A and a phase coupling layer 13B.
- the spatial signal input to the Fourier transform layer 12 is separated into a first amplitude signal 21r and a first phase signal 21 ⁇ , which are spatial frequency signals, by the Fourier transform.
- the first amplitude signal 21r is input to the amplitude coupling layer 13A
- the first phase signal 21 ⁇ is input to the phase coupling layer 13B
- the first and second weight matrices W1 and W2 which are the weight matrices, are input to the phase coupling layer 13B, respectively.
- the second amplitude signal 22r and the second phase signal 22 ⁇ are output.
- the amplitude coupling layer 13A outputs the second amplitude signal 22r by multiplying the first amplitude signal 21r by the first weight matrix W1. Further, the phase coupling layer 13B outputs the second phase signal 22 ⁇ by multiplying the first phase signal 21 ⁇ by the second weight matrix W2.
- Each initial value of the first weight matrix W1 and the second weight matrix W2 has a pseudo random number value or another initial value described above. Then, in the amplitude coupling layer 13A and the phase coupling layer 13B, the first weight matrix W1 and the second weight matrix W2 are matrixed so that the relationship between the input and the output becomes close by the error back propagation method. Update the value in. That is, the values in the first weight matrix W1 and the second weight matrix W2 are updated by training. It is assumed that the first weight matrix W1 has only positive real numbers with respect to the amplitude matrix (input matrix) x that becomes the first amplitude signal 21r.
- the amplitude matrix x is a positive real number, it may be converted to an absolute value
- for the elements of the matrix of (W1) x but in this case, the first weight matrix W1 is positive.
- Learn by setting a constraint that only real numbers are used. By providing constraints, the search range during learning can be reduced, and the number of operations can be reduced. In addition, learning can be speeded up by eliminating the need for absolute value conversion operations for each element.
- the second weight matrix W2 is not restricted with respect to the phase matrix (input matrix) x that becomes the first phase signal 21 ⁇ .
- a modulo operation of 2 ⁇ is performed on the matrix of (W2) x so that the value is 0 or more and less than 2 ⁇ , or ⁇ or more and less than ⁇ .
- the complex activation function f that applies a trigonometric function to the phase matrix is used in the complex activation layer 14 in the subsequent stage, it is not necessary to perform the remainder calculation.
- the degree as a unit of angle calculated by multiplying 180 / ⁇ by radians may be used for the phase. In this case, the remainder calculation of 360 ° is performed.
- ⁇ Complex activation layer> The second amplitude signal 22r and the second phase signal 22 ⁇ output from the amplitude coupling layer 13A and the phase coupling layer 13B are input to the complex activation layer 14, and the activation function in the spatial frequency domain for these signals. The calculation is performed using the complex activation function f. In this case, the second amplitude signal 22r is updated and output as the third amplitude signal 23 by the calculation using the complex activation function f, and the second phase signal 22 ⁇ is output as it is.
- the response of the complex activation function f to the phase ⁇ (i) at each point i in the phase matrix constituting the second phase signal 22 ⁇ causes the response of the complex activation function f in the amplitude matrix constituting the second amplitude signal 22r.
- the value of the amplitude r (i) at the point at the same position as the point i is updated.
- FIG. 6 is a diagram illustrating an operation of updating the value of the amplitude r (i) using the complex activation function f.
- the value of the complex activation function f is calculated for the phase ⁇ (xi, yi) which is an element at the point i, where the position of the point i in the phase matrix (second phase signal 22 ⁇ ) is (xi, yi).
- the amplitude r (xi, yi) which is an element at the same position (xi, yi) as the point i in the phase matrix, is rewritten.
- the rewritten amplitude matrix is the third amplitude signal 23r.
- the complex activation function f which is an activation function in the spatial frequency domain, will be described below.
- a non-linear function is used as in the activation function in the spatial region.
- g (k ⁇ x) k ⁇ g (x) It can be defined as a function that does not satisfy either or both of.
- Logistic functions and hyperbolic tangent functions are examples of activation functions in the spatial domain. By multiplying such a non-linear function, a difference occurs between forward propagation and back propagation, which creates a function that cannot be expressed by a linear function alone, that is, weighting.
- the complex activation function f in this embodiment is a function for using this non-linear function for the spatial frequency signal after Fourier transform.
- the complex activation function f is different from the conventional activation function for spatial signals. Moreover, the complex activation function f cannot be created by Fourier transform including the activation function. This is also clear from the following equation.
- g is an activation function in the spatial region. F [g (x)] ⁇ F [g] ⁇ F [x] That is, the result of Fourier transforming the value x in the spatial region obtained by applying the activation function g, the Fourier transform of the activation function g, the Fourier transform of the value x, and the product of both are obtained. different. For example, when the Relu function is Fourier transformed, the Fourier transform diverges because the Relu function is monotonically increasing in the region of x ⁇ 0. Therefore, the Fourier transform of the Relu function does not become an activation function in the spatial frequency domain.
- the complex activation function f based on the characteristics of the Relu function is referred to as the complex Relu function.
- the Relu function in the spatial region calculates the same value as the input value when the input value is positive or 0, and 0 when the input value is negative.
- the complex Relu function is not determined only by one of the amplitude r and the phase ⁇ , but updates the amplitude component by using a function that applies a trigonometric function to the phase component. In this case, a trigonometric function is applied to the phase component, and a function obtained by multiplying the amplitude component is used to update the amplitude component when either the real axis component or the imaginary axis component is positive or 0, for example. Do not use the same value, and if it is negative, update it to the value calculated by the function.
- the complex activation function f is a case where either the real axis component or the imaginary axis component is positive or 0 with respect to the phase ⁇ (i) at each point i in the matrix constituting the second phase signal 22 ⁇ . And, the value of the amplitude r (i) at the point at the same position as the point i in the matrix constituting the second amplitude signal 22r is updated by the response different from the negative case.
- An example of the complex activation function f using the complex Relu function is shown in the following equation (5).
- FIG. 7 illustrates the above equation (5).
- the real axis u and the imaginary axis jv are for a circle having a radius r
- the jv component is converted to
- the component of the real axis u is negative, it is equivalent to replacing the amplitude r with the absolute value of rsin ⁇ .
- another example of the complex activation function f using the complex Relu function is shown in the following equation (6).
- FIG. 8 illustrates the above equation (6).
- the imaginary axis jv component when the imaginary axis jv component is negative, it is equivalent to replacing the amplitude r with the absolute value of rcos ⁇ . This is the same value as substituting ( ⁇ + ( ⁇ / 2)) for ⁇ in the above equation (5), but in terms of ease of program creation, less number of comparisons, that is, faster calculation. It is superior to the complex activation function f by the complex Relu function represented by the above equation (5). Moreover, since the number of comparisons of ⁇ can be reduced, the amount of calculation can be reduced.
- the possible value of the amplitude r may be divided into two or more by the phase ⁇ , and the condition of ⁇ may be divided into three or more.
- a discontinuous function can also be used.
- a step function which is a discontinuous function may be used as an activation function.
- the continuity of the complex activation function f is not an indispensable condition, and it may be a non-linear function and a structure in which the amplitude r is rewritten by the output calculated by the value of the phase ⁇ . ..
- the complex activation function f using the non-continuous complex Relu function is represented by, for example, the following equation (10).
- each of the weight matrix W is used to minimize the loss L, which is the difference between the output of the NN and the teacher data.
- the error back propagation method is a means for finding the optimum value, and is based on the gradient descent method.
- the weight matrix W is updated based on the following equation (12) using the weight matrix W, the learning coefficient ⁇ , and the loss L which is a component of the difference between the inference result and the correct label. I will go.
- the ⁇ L / ⁇ W (calculation in which the loss L is partially differentiated with respect to the weight matrix W) to be calculated in the gradient descent method is calculated by the error backpropagation method in which forward propagation and back propagation are repeated, and the weight matrix W is updated. ..
- the process from the input layer to the output layer is called forward propagation, and the process from the output layer to the input layer is called back propagation.
- the operation of updating the weight matrix W is learning, and the process of learning is training.
- the training is completed, the learning of NN is completed.
- the data used for training is called training data, but it is used in the same meaning as training data.
- early termination may be used to stop learning when the desired performance set before learning is satisfied, which helps prevent overfitting and shortens learning time. Connect. In this regard, there is no difference between this embodiment and the CNN technique performed in the spatial domain.
- the gradient descent method is an algorithm used to search for a solution used when minimizing (generally optimizing) an objective function.
- the stochastic gradient descent method represented by the above equation (12) is used, which is a method generally used when the objective function to be minimized is differentiable.
- the learning coefficient ⁇ becomes an important parameter for the gradient descent method, and various methods such as the AdaGrad method, the Adam method, and the momentum method are known. These methods are the same in the learning in the spatial frequency domain as in the learning in the spatial domain, and detailed methods will be omitted.
- the Newton method or the quasi-Newton method which is a derivative method thereof, converges to the solution faster.
- the gradient descent method such as the AdaGrad method, the Adam method, and the momentum method described above, it is preferable to use the Newton method or the quasi-Newton method because the convergence speed and the accuracy greatly depend on the value of the learning coefficient ⁇ .
- the gradient descent method is practically used.
- the purpose of learning is not the training data (input data and correct label) used for learning, but the purpose of making correct inferences for unknown samples given after learning.
- the error for the former training data is called the training error
- the expected value of the error for the latter unknown data is called the generalization error.
- the purpose of learning is to reduce this generalization error, it cannot be calculated at the time of learning like the training error.
- a method is used in which a sample set different from the training data is prepared, and the error calculated by the same method as the training error is used as a guideline for the generalization error. For example, when MNIST data is used, 70% to 80% of the total data is used as training data, and the remaining 20% to 30% of the data is used as test data. Calculate the test error. Specifically, the training data is 60,000 and the test data is 10,000. The change in the test error that accompanies the update of the weight matrix W is called the learning curve.
- Weight attenuation is a method of updating weights so that the larger the value of the weight matrix, the closer to 0. This alleviates the divergence of weight.
- Various methods such as a regularization method are known for weight attenuation, and there are no restrictions when applied to the spatial frequency domain as in the spatial domain.
- overfitting can be prevented by probabilistically setting the component of the amplitude matrix to 0 in the dropout layer. Usually, 20% to 50% is used as the probability.
- overfitting is likely to occur even when the amount of learning data is small.
- the data is inflated. Specifically, translation of one image, left-right mirror image inversion, geometric deformation such as rotation, shading, color variation, and random noise are uniformly added. By increasing the data in this way, it becomes easier to prevent overfitting, and the same applies to this embodiment.
- overfitting may be prevented by using a loosely coupled layer instead of the fully coupled layer, or by giving the hidden layer a method of lowering significant figures such as early termination and calculation results.
- Forward propagation is used in inference to estimate the result using the trained weight matrix.
- forward propagation and back propagation are performed a plurality of times. Forward propagation applies a hidden layer matrix or function to the input data at any time.
- back propagation the inferred value obtained by the forward propagation and the error information with the difference between the correct label as an error are back-propagated from the upper layer immediately after to the lower layer immediately before.
- ⁇ r is an update formula for r from the upper layer to the lower layer
- ⁇ is an update formula for ⁇ from the upper layer to the lower layer.
- the complex activation function f becomes more complicated, it becomes more difficult to express the derivative as a mathematical formula.
- a numerical differentiation is used in which the amount of change ⁇ y when ⁇ is moved by a minute amount ⁇ , that is, ( ⁇ y / ⁇ ) is the derivative. If the derivative is formulated, the calculation is completed by substitution, whereas the numerical differentiation requires subtraction or division, so the amount of calculation is large, but it is necessary to find the derivative for any complex activation function f. Can be done.
- ⁇ L / ⁇ W for back-propagating the loss L is calculated.
- two functions that convert a matrix into a matrix and set them as intermediate values Y and Z.
- the size of the input matrix and the output matrix shall be equal.
- the input X, the weight matrix W, the intermediate values Y and Z are considered as a matrix
- the loss L is considered as a scalar.
- ⁇ L / ⁇ W, ⁇ L / ⁇ Z, ⁇ L / ⁇ Y, ⁇ L / ⁇ X, ⁇ L / ⁇ W are calculated.
- the size of each matrix is equal to Z, Y, X, W.
- the output layer 16 will be described below.
- an activation function which is a function that deforms a signal in order to obtain a desired output 7, is used.
- the activation function used in the output layer 16 is called an output activation function.
- This scale is called an error function in this embodiment.
- Regression analysis is a method of defining a function that reproduces training data for a function that takes a continuous value in the output.
- select a NN output activation function whose range matches the range of the target function.
- a square error is used for the difference between the output result of the output activation function and the correct label. Considering the derivative of backpropagation, the squared error multiplied by 1/2 is generally used for the error function.
- the input data is classified into a finite number of classes.
- MNIST is a multi-value classification problem that classifies handwritten characters with numbers from 0 to 9 in input into 10 ways.
- the softmax function is used as the output activation function.
- the cross entropy is used for the error function.
- the softmax function and cross entropy are similar to the method in the spatial domain.
- the input data is classified into two types.
- the logistic function is used for the output activation function
- the same method as the maximum likelihood estimation is used for the error function.
- binary classification is also considered as a kind of multi-value classification, and as with multi-value classification, a softmax function may be used as the output activation function and a cross entropy may be used as the error function.
- FIG. 11 is a diagram showing a configuration example of NN when two layers each of the binding layer 13 and the complex activation layer 14 are used. The accuracy of the test data when such an NN is used is shown in FIG. 12 together with the case of the first comparative example, that is, the CNN having two convolution layers 2.
- the result of supervised learning when handwritten characters are classified into 10 ways using MNIST as input data is shown by the transition of calculation accuracy with respect to the number of trainings (number of calculations).
- the solid line in FIG. 12 is the case where the NN according to this embodiment is used, and the dotted line is the case where the CNN of the first comparative example is used.
- the complex activation function f represented by the above formula (6) was used.
- FIG. 12 in the NN according to this embodiment, when the number of calculations exceeds 1500, it can be inferred with an accuracy of about 95% with respect to the test data.
- the CNN of the first comparative example has an accuracy of about 97%, so it can be said that the performance is almost the same.
- FIG. 13 is a diagram for explaining the speed-up effect by NN according to this embodiment.
- a first comparative example using a CNN having two convolutional layers 2 and a second comparative example in which a complex convolution operation is performed twice and a Fourier transform and an inverse Fourier transform are performed before and after each complex convolution operation are shown.
- the difference in calculation speed is shown for NN according to the embodiment.
- the time required for the actual calculation using MNIST in the calculation by the CPU calculation in this case, the calculation of 2000 times shown in FIG. 12 is shown.
- the training data is 60,000 and the test data at the time of inference is 10,000.
- the MNIST is a relatively small image of 32 ⁇ 32, but the calculation time of the first comparative example is 260 seconds, that of the second comparative example is 200 seconds, and that of the NN according to this embodiment is 75 seconds. It takes about 30% of the time compared to the example, and it can be seen that the speed is significantly increased.
- a dedicated IC for Fourier transform and inverse Fourier transform that require calculation time it is possible to calculate NN with a large number of layers even with a small processing device such as a microcomputer. ..
- the NN according to this embodiment can perform calculations with almost the same accuracy as the conventional CNN. Further, since the convolution operation that requires a large amount of calculation is replaced with the complex convolution operation by one matrix operation, the amount of calculation can be greatly reduced. Furthermore, by using the complex activation function f, if the Fourier transform and the inverse Fourier transform in the hidden layer are performed once each, the complex convolution operation and the complex activity can be performed in the processing in the spatial frequency domain continuous between them. The conversion function f can be used many times, and the amount of calculation required for the Fourier transform and the inverse Fourier transform can be reduced. Therefore, high-speed processing is possible even for a large image, and high recognition accuracy by NN can be obtained with a low-cost and simple hardware 100 configuration.
- all the operations of each layer between the Fourier transform and the inverse Fourier transform are operations dealing with the spatial frequency signal in the continuous spatial frequency domain, and the input signal and the output signal for each operation are used. Therefore, it becomes possible to analyze the physical function of each operation. As a result, detailed analysis of the NN becomes possible, the configuration of the NN can be determined from the analysis results of each layer, and a high-performance NN can be constructed.
- a two-dimensional image can be processed at a higher speed than the CNN of the first comparative example, and particularly in an NN having a plurality of connecting layers 13 corresponding to the convolution layer 2. , Demonstrate a great effect.
- image recognition in addition to the data acquired by CMOS (Complementary Metal-Oxide-Semiconducor) or the like, an image obtained by visualizing electromagnetic waves with an infrared camera, an ultraviolet camera, a phased array antenna, or the like can be used as input data.
- CMOS Complementary Metal-Oxide-Semiconducor
- the analysis model designed by GUI using CAD of two dimensions or more is used as input data.
- Learning may be performed using the data using the analysis result in the simulator as the correct answer label.
- one-dimensional data can be regarded as two-dimensional data by converting it into a spectrogram, so that the method according to this embodiment can be used.
- the NN according to this embodiment is used as an information processing device that processes input data having a time change such as a moving image by using a convolution operation, such as a generally known Long short-term memory (Convolutional LSTM). It can also be applied. Further, by converting to a spectrogram, the method according to this embodiment can be applied to the entire RNN (Recurrent Neural Networks) where a convolution operation is required.
- a convolution operation such as a generally known Long short-term memory (Convolutional LSTM).
- an image that is not included in the training data or test data is generated by using an NN called GAN (Generative Adversarial Network) that simultaneously trains two NNs, a generation network (generator) and an identification network (discriminator).
- GAN Geneative Adversarial Network
- This embodiment can also be applied to the method of performing.
- This GAN usually constitutes a generation network and an identification network by stacking a large number of convolutional layers, but since two NNs are trained at the same time, the number of convolutional layers increases and the amount of calculation becomes enormous. By using the method of the embodiment, the calculation cost can be significantly reduced.
- a large-scale computer was indispensable for a calculation such as GAN because it is usually learned using several tens to tens of thousands of GPU boards (printed circuit boards on which GPU arithmetic elements are mounted). By applying this embodiment, speeding up of about 3 to 8 digits can be expected, although it depends on the size of the generated image. Depending on the conditions, learning is possible even with hardware that does not have a parallel processing mechanism such as GPU.
- the input data may be a two-dimensional or higher-dimensional image, so that it is possible to input simulation data and perform a desired design, for example.
- the user can set the conditions and limit the output by using the weight function that has been learned.
- GAN enables a desired design, and by applying this embodiment, the time required for learning the design and the time required for inference can be significantly reduced.
- reinforcement learning a method of advancing learning using a two-dimensional image as input data is often used, and most of them have a plurality of convolution layers.
- reinforcement learning does not give a correct answer label, but is learned by an NN called an agent repeating trial and error, so that the amount of calculation becomes enormous.
- NN can be efficiently learned.
- the self-encoder is an NN that trains, for example, to output the same image when a two-dimensional image is input to the input layer, and various arithmetic processes including a convolution layer between the input layer and the output layer. There is, and arithmetic processing is performed so that necessary information is not lost. Since the convolution operation is an indispensable technique even in unsupervised learning, the calculation cost can be significantly reduced by applying this embodiment.
- the input data is not limited to two-dimensional data, and the three-dimensional data may be used as the input data as it is as the three-dimensional data such as a bird's-eye view, a cross-sectional view, a CAD or a 3D camera (stereo camera) in combination with a plan view. ..
- Embodiment 2 In the second embodiment, a complex activation function f different from the complex Relu function used in the first embodiment is used.
- the Fourier transform layer 12 decomposes into an amplitude component which is a real number of 0 or more and a phase component which is a real number of ⁇ or more and less than ⁇ .
- Other configurations are the same as those in the first embodiment.
- the complex activation layer 14 includes the second amplitude signal 22r and the second phase signal 22 ⁇ output from the amplitude coupling layer 13A and the phase coupling layer 13B. Is input, a calculation is performed on these signals using the complex activation function f, the second amplitude signal 22r is updated and output as the third amplitude signal 23r, and the second phase signal 22 ⁇ is output as it is.
- the response of the complex activation function f to the phase ⁇ (i) at each point i in the phase matrix constituting the second phase signal 22 ⁇ causes the response of the complex activation function f in the amplitude matrix constituting the second amplitude signal 22r.
- the value of the amplitude r (i) at the point at the same position as the point i is updated.
- the complex activation function f based on the characteristics of the logistic function in the spatial domain is referred to as a complex logistic function.
- the complex activation function f used in the first embodiment updates the value of the amplitude r (i) with different responses depending on the magnitude of the phase ⁇ (i).
- the complex activation function f used in the above updates the value of the amplitude r (i) by a constant response by the same arithmetic expression regardless of the magnitude of the phase ⁇ (i).
- An example of the complex activation function f using the complex logistic function is shown in the following equation (21).
- k is a real number larger than 1.
- the output of the complex activation function f moves between 0 and 1. This output replaces the amplitude r.
- the complex activation function f shown in the following equation (22) may be used.
- the maximum value of the output of the complex activation function f the (2 / (k 2 -1) ).
- the minimum value of the complex activation function f is a real number of 0 or more, and the above condition that the amplitude component is a real number of 0 or more is satisfied.
- This complex activation function f is a modification of the Gaussian error function.
- the same effect as that of the first embodiment can be obtained. That is, it is possible to perform calculations with almost the same accuracy as the conventional CNN. Further, since the convolution operation that requires a large amount of calculation is replaced with the complex convolution operation by one matrix operation, the amount of calculation can be greatly reduced. Furthermore, by using the complex activation function f, if the Fourier transform and the inverse Fourier transform in the hidden layer are performed once each, the complex convolution operation and the complex activity can be performed in the processing in the spatial frequency domain continuous between them. The conversion function f can be used many times, and the amount of calculation required for the Fourier transform and the inverse Fourier transform can be reduced. Therefore, high-speed processing is possible even for a large image, and high recognition accuracy by NN can be obtained with a low-cost and simple hardware 100 configuration.
- all the operations of each layer between the Fourier transform and the inverse Fourier transform are operations that handle spatial frequency signals within a continuous spatial frequency domain, and the physics of each operation is based on the input signal and output signal for each operation. Function can be analyzed. As a result, detailed analysis of the NN becomes possible, the configuration of the NN can be determined from the analysis results of each layer, and a high-performance NN can be constructed.
- the derivative can be theoretically calculated and the amount of calculation is small, as in the complex Relu function shown in the first embodiment. It is desirable to use f.
- Embodiment 3 In the third embodiment, a complex activation function f different from that of the first and second embodiments is used. Other configurations are the same as those in the first embodiment. Also in the third embodiment, similarly to the first embodiment, the complex activation layer 14 includes the second amplitude signal 22r and the second phase signal 22 ⁇ output from the amplitude coupling layer 13A and the phase coupling layer 13B. Is input, and the second amplitude signal 22r and the second phase signal 22 ⁇ are updated and output by performing calculations on these signals using the complex activation function f. The second amplitude signal 22r and the second phase signal 22 ⁇ are updated in the same manner by the same method, but the second phase signal 22 ⁇ may be held as it is and only the second amplitude signal 22r may be updated.
- FIG. 14 is a diagram illustrating an operation of updating the second amplitude signal 22r and the second phase signal 22 ⁇ using the complex activation function f.
- the second amplitude signal 22r and the second phase signal 22 ⁇ generated by the coupling layer 13 are an amplitude matrix and a phase matrix, which are two-dimensional matrices, respectively.
- Each axis of the amplitude matrix indicates the frequency axis, and each element indicates the amplitude value.
- N and M are set to N ⁇ 2 and M ⁇ 1, respectively, and the frequency axis component, that is, the frequency component is 1 / N, and the amplitude of each element is 1 / M, which is a reduced micromatrix.
- Generate Lr This minute matrix Lr is added to the original amplitude matrix. The matrix generated after addition is used as the output (updated amplitude matrix) of the complex activation function f.
- phase matrix is also subjected to the operation of generating and adding the minute matrix L ⁇ , and the matrix generated after the addition is used as the output of the complex activation function f (updated phase matrix).
- the complex activation function f is a non-linear function in the spatial frequency domain.
- the input second amplitude signal 22r and the second phase signal 22 ⁇ can effectively aggregate the signal components into the lower frequency components.
- a pooling layer which will be described later, is used after using this complex activation function f, information can be aggregated in low frequency components while preventing deterioration of main information in the pooling layer.
- the pooling layer is provided after the complex activation layer 14, and serves as a low-pass filter in electrical engineering, more generally as a filter.
- the calculation of the complex activation function f according to the third embodiment requires more calculation time than the calculation of the complex activation function f shown in the first or second embodiment, but it deteriorates the information. It is possible to perform more accurate calculations.
- the calculation of the complex activation function f according to the third embodiment is combined with the error backpropagation method in the learning of NN, the calculation is performed for each element of each matrix when reciprocating between the upper layer and the lower layer a plurality of times. Therefore, the number of calculations becomes enormous. Therefore, it is effective to set N and M to the power of 2 respectively and to reduce the weight of the calculation by using the shift operation for the generation of the minute matrices Lr and L ⁇ . Since the shift operation does not need to be a decimal number and is a 2-bit operation that is good at calculation, the calculation cost is small in the operation using a von Neumann computer. For example, in a compiled language such as C language, the calculation cost is about 1/10.
- Right-shift bit operations may be performed for operations that make 1/2 a power, such as 1/2, 1/4, and 1/8.
- each element of the second amplitude signal 22r, the amplitude matrix forming the second phase signal 22 ⁇ , and the phase matrix is thinned out one by one.
- the size of the rows and columns of each matrix is halved to become the minute matrices Lr and L ⁇ .
- the minute matrices Lr and L ⁇ whose size has become smaller, the high frequency components are zero-filled and processed to the same size as the amplitude matrix and phase matrix before thinning.
- the output signal of the simplest complex activation function f can be generated by adding the processed minute matrices Lr and L ⁇ to the matrix amplitude matrix and phase matrix before thinning.
- a plurality of minute matrices Lr and L ⁇ may be used, respectively, and not only 1/2 but also minute matrices Lr and L ⁇ such as 1/4 and 1/8 may be further added.
- the convolution operation requiring a large amount of calculation is replaced with the complex convolution operation by one matrix operation, so that the amount of calculation can be greatly reduced.
- the complex activation function f if the Fourier transform and the inverse Fourier transform in the hidden layer are performed once each, the complex convolution operation and the complex activity can be performed in the processing in the spatial frequency domain continuous between them.
- the conversion function f can be used many times, and the amount of calculation required for the Fourier transform and the inverse Fourier transform can be reduced. Therefore, high-speed processing is possible even for a large image, and high recognition accuracy by NN can be obtained with a low-cost and simple hardware 100 configuration.
- all the operations of each layer between the Fourier transform and the inverse Fourier transform are operations that handle spatial frequency signals within a continuous spatial frequency domain, and the physics of each operation is based on the input signal and output signal for each operation. Function can be analyzed. As a result, detailed analysis of the NN becomes possible, the configuration of the NN can be determined from the analysis results of each layer, and a high-performance NN can be constructed.
- Embodiment 4 a complex activation function f different from that of the first to third embodiments is used. Other configurations are the same as those in the first embodiment.
- the complex activation layer 14 includes the second amplitude signal 22r and the second phase signal 22 ⁇ output from the amplitude coupling layer 13A and the phase coupling layer 13B. Is input, and the second amplitude signal 22r and the second phase signal 22 ⁇ are updated and output by performing calculations on these signals using the complex activation function f.
- the second amplitude signal 22r and the second phase signal 22 ⁇ are updated in the same manner by the same method, but the second phase signal 22 ⁇ may be held as it is and only the second amplitude signal 22r may be updated.
- the complex activation function f performs a convolution operation on the input signal (target signal) with the function that maximizes the absolute value of the value at the reference origin as the kernel F [h]. ..
- a complex activation function f having the same accuracy as the activation function of the spatial region can be obtained.
- the convolution calculation is performed on the second amplitude signal 22r using this F [h]
- the convolution calculation may be performed on the second phase signal 22 ⁇ in the same manner.
- the amplitude of the result (F [g] * F [h]) after the convolution operation there is a point where the amplitude is 0, and the point where the amplitude is 0 becomes the information in the NN.
- the output of the second amplitude signal 22r needs to be a positive real number, it is desirable to convert it to an absolute value with respect to the output (F [g] * F [h]) after the convolution operation.
- the amount of calculation becomes enormous when the calculation is performed by the backpropagation method.
- the sigmoid function shown in the following equation (26) is a function that converges in the spatial frequency domain.
- the Fourier transform and the inverse Fourier transform in the hidden layer are performed once by using the complex activation function f, the operation of each layer between the Fourier transform and the inverse Fourier transform can be performed.
- All are operations that handle spatial frequency signals within a continuous spatial frequency domain. Therefore, it is possible to analyze the physical function of each operation from the input signal and the output signal for each operation. As a result, detailed analysis of the NN becomes possible, the configuration of the NN can be determined from the analysis results of each layer, and a high-performance NN can be constructed.
- the complex activation function f performs a convolution operation using the kernel in the spatial frequency domain. Therefore, it is possible to perform a calculation theoretically equivalent to the CNN of the first comparative example and obtain the same accuracy, but the calculation speed is not improved.
- the CNN method is used in the calculation for handling the spatial frequency signal in the continuous spatial frequency domain, and the detailed analysis of the CNN, which is the conventional method, becomes possible.
- the convolution operation (Hadamard product in the spatial frequency domain) is responsible for emphasizing the edge (the boundary line between objects).
- the activation function (complex activation function in the spatial frequency domain) is a non-linear function and is responsible for generating frequency components different from the input signal (both low frequency band and high frequency band). Pooling (complex pooling in the spatial frequency domain), which will be described later, plays the role of a filter typified by a low-pass filter. In this way, each operation can be clearly separated and analyzed, which can contribute to the development of NN including CNN.
- CNN analyzes the input image by extracting edges in various directions using multiple kernels obtained by training and synthesizing them.
- the features of the image are extracted by the convolution process that involves a plurality of operations in this way. Then, a process having the same physical meaning as this can be performed in the spatial frequency domain by a complex convolution operation that does not use a convolution operation using a kernel.
- the analysis of the activation function (complex activation function in the spatial frequency domain) will be described.
- the processing of the spatial region can express a structure similar to the firing of nerve cells in the brain with a non-linear function.
- the function of pooling (complex pooling in the spatial frequency domain) described later is also analyzed, the most important role of the activation function in deep learning is. , It is possible to derive the result that it is to generate a frequency component different from the input signal.
- the Relu function in the spatial domain can be reduced to processing half-wave rectification in the spatial frequency domain in the spatial frequency domain. That is, in the half-wave rectification process for a triangular wave (general image) having two or more different periods, a part of energy is transferred to a low frequency component close to a DC component in addition to the frequency component of the original triangular wave. .. Therefore, as shown in the third embodiment approximately, it can be seen that the same effect can be obtained by the calculation of outputting a part of the signal to the low frequency component side.
- the calculation in the spatial frequency domain does not give the same result as the calculation in the spatial domain, but the difference in the calculation results can be adjusted by training in the complex convolution calculation.
- the fifth embodiment is provided with a complex pooling layer in the NN according to the first to fourth embodiments.
- the complex pooling layer 18 is a layer in the spatial frequency domain corresponding to the pooling layer 4 in the spatial domain.
- FIG. 15 is a diagram showing the configuration of the NN according to the fifth embodiment, and is a partial detailed view corresponding to FIG. 4 shown in the first embodiment.
- the amplitude coupling layer 13A has a first weight matrix W1 and outputs a second amplitude signal 22r by applying the first weight matrix W1 to the first amplitude signal 21r.
- the phase coupling layer 13B applies the second weight matrix W2 to the first phase signal 21 ⁇ and outputs the second phase signal 22 ⁇ .
- the complex activation layer 14 updates only the second amplitude signal 22r out of the second amplitude signal 22r and the second phase signal 22 ⁇ and outputs it as the third amplitude signal 23r by using the complex activation function f.
- the second phase signal 22 ⁇ is output without being updated.
- the complex pooling layer 18 is provided immediately after the complex activation layer 14, and performs arithmetic processing on the signal updated by the complex activation layer 14. In this case, since only the amplitude component is updated, the input third amplitude signal 23r is subjected to arithmetic processing and the amplitude signal 23ra is output.
- the complex pooling layer 18 is provided in the above embodiments 3 and 4, and the complex activation layer 14 updates both the second amplitude signal 22r and the second phase signal 22 ⁇ , the complex activation layer 14 is updated.
- the complex pooling layer 18 performs arithmetic processing on both signals.
- pooling The operation of the pooling layer 4 in the spatial domain is referred to as pooling, and the operation of the complex pooling layer 18 in the spatial frequency domain is referred to as complex pooling.
- Pooling in the spatial region reduces the sensitivity of the position of the feature extracted by the convolution layer 2, and can be recognized as an image having the same feature amount even if the position of the target feature amount in the image changes. It is a thing. This means “blurring" the image. Applying "blurring" to the spatial frequency domain can be easily obtained by removing the high frequency components. Since the high frequency component is a component generated when the elements of adjacent pixels change suddenly, complex pooling can be obtained by removing the high frequency component in the spatial frequency domain.
- the calculation on the complex pooling layer 18, that is, the complex pooling corresponds to a low-pass filter in the field of signal processing. Then, by performing complex pooling, the structure becomes strong against the displacement and rotation of the input image, overfitting of the NN is prevented, and the calculation accuracy is improved.
- the pooling in the spatial area is generally the maximum value pooling that cuts out a map for each image size of S ⁇ T (S and T are integers of 2 or more) and outputs the maximum value, or the average value pooling that outputs the average value.
- S and T are integers of 2 or more
- Lp pooling and the like are known. These differences in pooling can be considered to indicate differences in the shape of the low-pass filter in the spatial frequency domain.
- a window function obtained by transforming a window function such as a Han window, a Humming window, a Hanning window, a Blackman window, or a Kaiser window according to the dimension of the input signal of each hidden layer may be used for complex pooling.
- a window function obtained by transforming a window function such as a Han window, a Humming window, a Hanning window, a Blackman window, or a Kaiser window according to the dimension of the input signal of each hidden layer
- the calculation result is converted to an absolute value so that the negative component is not generated. Since this window function can also remove the discontinuity in the Fourier transform and the inverse Fourier transform, it is possible to remove the noise component generated by the influence of the numerical processing.
- a high-pass filter that removes information offset by a single color, that is, a low frequency component close to direct current, it is possible to remove single color information such as a background component. Specifically, with respect to the frequency-shifted signal, the signal corresponding to several pixels in the central portion is removed. In this case, a high-pass filter that suppresses the pixels in the central portion, which is usually 10% or less, is provided. Note that this method cannot be used for data in which the DC component is an important factor.
- a bandpass filter in which the highpass filter and the lowpass filter are combined.
- a bandpass filter there is a method using a Gabor filter which is a function represented by the product of a trigonometric function and a Gaussian function shown by David Hubel and Torsten Wiesel. Further, any bandpass filter may be used in combination with the lowpass filter and the highpass filter.
- the complex activation function f shown in the above-described third or fourth embodiment or the activation function used in CNN has a meaning for NN to generate a frequency component lower than a specific frequency.
- the output of the activation function (Relu function) becomes 0.
- An output of 0 means a frequency of 0 in the spatial frequency domain, that is, a DC component, and a half-wave rectification in electrical engineering.
- half-wave rectification changes the frequency continuously from the DC component to a specific frequency, so even if the input signal to the activation function is a single frequency, the output signal will be wideband. It will have the frequency component of.
- the main information in the low frequency component can be effectively aggregated.
- the aggregation of information described here means that the high frequency component has been removed by a low-pass filter (for example, a Gaussian filter).
- the complex pooling layer 18 is provided not only immediately after the complex activation layer 14 but also before the complex activation layer 14.
- pooling (complex pooling in the spatial frequency domain) can also be analyzed, which will be described below.
- pooling in the spatial region is roughly classified into maximum value pooling and average value pooling. For example, it is an operation of cutting an image with 2 ⁇ 2 bits and outputting the maximum value or the average value among the 2 ⁇ 2 bits. This operation has the effect of blurring. Further, by blurring, it is possible to produce an effect that the same image can be recognized even if the position of the input image is deviated or rotated.
- the pooling operation is a low-pass filter. It can also be seen that the maximum value pooling and the average value pooling differ only in the edge sharpness at the cutoff frequency of the filter. For this reason, pooling in the spatial domain has a physically ambiguous meaning, but in the spatial frequency domain, complex pooling can be applied as an explicit filter. This allows complex pooling to build deep learning models with higher accuracy in inference than filters in the spatial domain. Further, in addition to the low-pass filter, an arbitrary filter such as a band-pass filter that removes only DC components and high-frequency components can be constructed, so that a deep learning model with a high degree of freedom can be constructed.
- this complex pooling is a low-pass filter has a great connection with the activation function (complex activation function in the spatial frequency domain).
- the activation function has the effect of producing low-frequency components.
- the low-frequency component generated by the activation function can be left, and the high-frequency component originally generated by the input signal can be removed.
- the cutoff frequency of the low-pass filter that is, the design of the low-pass filter.
- CNN is to convolve the filter obtained by training, extract the edge of the image, generate a signal in the low frequency component by the nonlinear function that becomes the activation function, leave the frequency component that appeared in the low frequency component by pooling, and leave it. This is a method of learning by removing components other than the above.
- Embodiment 6 is provided with a complex batch normalization layer in the NN according to the above-described first to fifth embodiments.
- FIG. 16 is a diagram showing the configuration of the NN according to the sixth embodiment, and is a partial detailed view corresponding to FIG. 4 shown in the first embodiment.
- the complex batch normalization layer 19 is provided between the bond layer 13 and the complex activation layer 14 is shown, but in addition to this, the complex batch normalization layer 19 is provided at the subsequent stage of the complex activation layer 14 or at the front stage of the bond layer 13. You may.
- the complex batch normalization layer 19 is a hidden layer that performs complex batch normalization in the spatial frequency domain corresponding to the batch normalization in the spatial domain.
- Complex batch normalization can reduce the effect of internal covariate shift by performing the same calculation as batch normalization in the spatial domain only for the amplitude signal in the spatial frequency domain, and shortens the training time. it can.
- the complex batch normalization layer 19 since the complex batch normalization layer 19 is arranged between the coupling layer 13 and the complex activation layer 14, the complex batch normalization layer 19 makes only the second amplitude signal 22r output by the amplitude coupling layer 13A complex. Batch normalize and output the amplitude signal 22ra.
- the demerit to the calculation accuracy by providing the complex batch normalization layer 19 is small, it is before or after the complex pooling layer 18, or before a hidden layer such as an amplitude logarithmic layer or an axis logarithmic layer, which will be described later.
- a complex batch normalization layer 19 may be provided later.
- the calculation time of the complex batch normalization layer 19 itself affects and the learning takes time, it is desirable to reduce the number of the complex batch normalization layers 19 and take the following measures. That is, the complex activation function f is changed, the initial value of the weight matrix W is pre-learned, the learning coefficient in the gradient descent method is lowered, and the degree of freedom of NN is restricted by a dropout layer or a loosely coupled layer.
- Embodiment 7 the NN according to the first to sixth embodiments is provided with an amplitude logarithmic layer and an inverse amplitude logarithmic layer.
- FIG. 17 is a diagram showing the configuration of the NN according to the seventh embodiment, showing the case where the amplitude logarithmized layer and the inverse amplitude logarithmized layer are applied to the sixth embodiment, and the portion of the NN corresponding to FIG. It is a detailed view.
- an amplitude logarithmic layer 10A is provided between the Fourier transform layer 12 and the coupling layer 13
- an inverse amplitude logarithmic layer 10B is further provided in front of the inverse Fourier transform layer 15.
- the first amplitude signal 21r When the input signal 20 is Fourier-transformed by the Fourier transform layer 12 and divided into a first amplitude signal 21r and a first phase signal 21 ⁇ , the first amplitude signal 21r has a large amplitude signal at a specific frequency on the frequency axis. Signals with amplitude may occur. In that case, a signal having a large amplitude may cause the other components of the matrix that becomes the amplitude signal to become almost 0, lose the characteristics of the image, and learn only the portion having a large amplitude.
- the amplitude logarithmic layer 10A calculates a logarithm with respect to the amplitude of the first amplitude signal 21r (amplitude matrix), that is, generates and outputs an amplitude signal 21ra whose amplitude is logarithmic. As a result, it is possible to suppress the adverse effect of a signal having a large amplitude generated after the Fourier transform and improve the reliability of learning.
- the radix a is generally the natural logarithm e, 2, or 10, but other real numbers may be used.
- the magnitude of the input signal may differ by about 2 to 3 digits, but for example, when the radix 10 is used, even if the difference is 3 digits, the change is 3 times, and the signal has a small amplitude. Can be trained to be sensitive and insensitive to signals with large amplitudes.
- the amplitude signal 21ra is used by the following calculation formula using the error component ⁇ for multiplying the logarithmized signal by a constant b which is a real number larger than 0 and avoiding the input signal becoming 0. May be generated.
- y b ⁇ log a (x + ⁇ )
- the error component ⁇ does not need to be input when it does not have 0 as an element of the amplitude matrix, and when it has 0 as an element of the amplitude matrix, an value that is one digit or more smaller than the minimum value excluding 0 is input. It is desirable to do.
- the constant b is used for the purpose of preventing the output value y from becoming smaller or excessive than the rounding error of the information processing apparatus, and 10 or 20 is often used, but other constants b. It can be a real number.
- the amplitude (logarithmic amplitude) of the input amplitude signal (third amplitude signal 23r) is returned to the true number to generate the amplitude signal 23ra. And output.
- the following equation (27) or equation (28) is used.
- Embodiment 8 the NN according to the first to seventh embodiments is provided with an axial logarithmic layer and an inverse logarithmic layer.
- FIG. 18 is a diagram showing the configuration of the NN according to the eighth embodiment, showing the case where the axial logarithmic layer and the inverse logarithmic layer are applied to the sixth embodiment, and the portion of the NN corresponding to FIG. It is a detailed figure. As shown in FIG. 18, an axis logarithmic layer 10C is provided between the Fourier transform layer 12 and the coupling layer 13, and an inverse logarithmic layer 10D is provided in front of the inverse Fourier transform layer 15.
- the axis logarithmic layer 10C will be described below.
- the input data used for learning is two-dimensional data.
- One axis of the two-dimensional data is called the X axis, and the other axis is called the Y axis.
- the X-axis and Y-axis of the first amplitude signal 21r and the first phase signal 21 ⁇ after being Fourier-transformed by the Fourier transform layer 12 are antilogarithms as in the input data.
- the first amplitude signal 21r and the first phase signal 21 ⁇ are input to the axis logarithmic layer 10C, and the axis logarithmic layer 10C is connected to the X-axis and Y-axis of the first amplitude signal 21r and the first phase signal 21 ⁇ .
- a logarithm is used, that is, an axial logarithmic amplitude signal 21rb and a phase signal 21 ⁇ b are generated and output.
- the radix may be a real number of 0 or more.
- the amplitude signal 21rb and the phase signal 21 ⁇ b can be two-dimensional data in which the low frequency components are emphasized in each of the X-axis and the Y-axis. It can be considered that the dynamic range has expanded, and even small changes in low-frequency components can be learned without being overlooked. It should be noted that the conventional CNN cannot use such a method. In this embodiment, the information of the low frequency component having a large amount of information can be emphasized, the information of the high frequency component having a small amount of information can be suppressed, and highly reliable learning can be efficiently advanced.
- the X-axis and Y-axis of the input amplitude signal (third amplitude signal 23r) and phase signal (second phase signal 22 ⁇ ) are made true numbers.
- the amplitude signal 23 rb and the phase signal 22 ⁇ b are generated and output.
- the inverse Fourier logarithm that returns the X-axis and the Y-axis to the antilogarithm is not always necessary if the inverse Fourier transform layer 15 in the subsequent stage can perform the inverse Fourier transform.
- the amplitude signal 21rb and the phase signal 21 ⁇ b may have images with a rough sampling interval on the low frequency side due to the axis logarithmization.
- interpolation is performed.
- known methods such as linear interpolation, polynomial interpolation, and spline interpolation are used.
- linear interpolation polynomial interpolation
- spline interpolation spline interpolation
- only one axis may be logarithmized.
- the viewpoints of learning are different between the X-axis and the Y-axis, similarly, only one of the axes may be logarithmized.
- the matrix of the amplitude signal (first amplitude signal 21r) becomes logarithmic in rows, columns, and each element, and the matrix of the phase signal (first phase signal 21 ⁇ ) becomes logarithmic in rows and columns. ..
- FIG. 19 is a diagram showing the configuration of the NN according to the ninth embodiment.
- the NN is composed of the amplitude coupling layer 13A and the phase coupling layer 13B in this order from the input layer 11A, the coupling layer 13 that performs the complex convolution operation, and the complex activity that performs the calculation using the complex activation function f. It includes a chemical layer 14, a complex pooling layer 18, an inverse Fourier transform layer 15, an output layer 16 and an output 17.
- a Fourier transform layer 12A which is a calculation unit that performs Fourier transform in the preprocessing of the NN, is provided separately from the NN, and the Fourier transform layer 12A Fourier transforms the input signal 20 in the input spatial domain to perform a Fourier transform in the spatial frequency domain.
- the first amplitude signal 21r and the first phase signal 21 ⁇ which are signals, are output.
- the NN input layer 11A passes the first amplitude signal 21r and the first phase signal 21 ⁇ , which are signals in the spatial frequency region, to the upper layer, the coupling layer 13, as input signals. Therefore, it is not necessary to perform the Fourier transform in the NN, it is not necessary to perform the iterative calculation in the Fourier transform layer at the time of learning, and the calculation of the back propagation thereof is also unnecessary, so that the calculation time can be shortened. Can be done.
- the Fourier transform layer 12A is referred to as an NN by applying this embodiment to the NNs shown in the above embodiments 1 to 8.
- the Fourier transform can be performed by NN preprocessing.
- FIG. 20 is a diagram showing the configuration of the NN according to the tenth embodiment.
- the NN is composed of a convolution layer 2 that performs a convolution operation, an activation function 3, a Fourier transform layer 12, an amplitude coupling layer 13A, and a phase coupling layer 13B in order from the input layer 11, and performs a complex convolution calculation. It is composed of a coupling layer 13 to be performed, a complex activation layer 14 to perform an operation using the complex activation function f, a complex pooling layer 18, an inverse Fourier transform layer 15, an output layer 16 and an output 17.
- the convolution layer 2 and the activation function 3 are inserted in the subsequent stage of the input layer 11 of the NN according to the fifth embodiment.
- the input signal 20, which is a spatial signal, is calculated by using the activation function 3 after the convolution calculation is performed by the kernel in the convolution layer 2 in the spatial region.
- the spatial signal after the calculation by the activation function 3 is divided into a first amplitude signal 21r and a first phase signal 21 ⁇ by the Fourier transform layer 12. Subsequent calculations are the same as in the fifth embodiment.
- the convolution layer 2 and the activation function 3 which are hidden layers in the spatial region are provided in front of the Fourier transform layer 12. Therefore, the NN is a combination of the spatial domain and the spatial frequency domain.
- the input signal 20 configured by combining RGB
- a method of dividing the input data for each color in the input layer is used in CNN.
- it in addition to the vertical and horizontal information, it usually has a three-dimensional shape in the channel direction, which is the dimension of RGB colors.
- This three-dimensional shape contains spatial information such as many similar values between pixels that are spatially close to each other.
- the three-dimensional shape may contain essential information of the image, such that there is a close relationship between each of the RGB channels, or pixels that are separated from each other are not so related to each other. ..
- the convolution layer 2 can extract and retain this information.
- unique methods in the spatial frequency domain such as complex convolution, complex activation function f, and complex pooling, can be used, enabling calculations that are impossible in the spatial domain.
- the spatial region it was difficult to investigate the amount of local change in the image, but in this embodiment, it is possible to proceed with the learning inference while retaining this information.
- Information on the physical positional relationship of the features lost by the Fourier transform can be left by providing the convolution layer 2 and the activation function 3 near the input layer 11. Therefore, learning and inference can be performed with high accuracy.
- the NN is configured to have multiple layers, and one convolutional layer 2 is provided between the input layer 11 and the Fourier transform layer 12.
- the entire NN is constructed by inserting a CNN having about 5 layers and a small number of layers. Then, training is performed between the Fourier transform layer 12 and the output layer 16 in the same manner as in the first embodiment. As a result, the speed can be increased even for multidimensional input data having three or more dimensions.
- the weight of calculation can be reduced, so that the number of images that can be processed by one learning can be increased. Therefore, when the same amount of calculation is considered, higher accuracy can be expected as compared with the case where the first comparative example is used.
- the convolution layer 2 and the activation function 3 which are hidden layers in the spatial region are provided in front of the Fourier transform layer 12, but the inverse Fourier transform layer 15 and the output layer 16 are used.
- the convolution layer 2 and the activation function 3 may be provided between the two, and may be provided in both the front stage of the Fourier transform layer 12 and the rear stage of the inverse Fourier transform layer 15.
- Embodiment 11 the electronic device that performs the control operation by using the NN by the information processing apparatus shown in the above-described first to tenth embodiments will be described.
- the information processing device is used for processing sensor information mounted on an air conditioner, processing sensor information such as a servo system used in a factory in factory automation, or processing sensor information installed outdoors in a vehicle. Used. Conventionally, in order to use CNN for these processes, it has been necessary to prepare a GPU, an ASIC (application specific integrated circuit), or an FPGA dedicated to the neural network process.
- information on an air conditioner, a servo system, an in-vehicle sensor, and the like can be processed by general-purpose hardware including an existing CPU microcomputer and memory.
- FIG. 21 is a diagram showing a configuration example of an air conditioner as an electronic device according to the eleventh embodiment.
- the air conditioner includes an information processing device 50 according to the present application, an infrared sensor 51 that recognizes an object 58, and a blower 52.
- the information processing device 50 includes an input unit 53 for inputting data, an analysis unit 54 for analyzing the data input from the input unit 53, a storage unit 55 for recording the analysis result, and an object 58 for the analysis result.
- a determination unit 56 for determination and a control unit 57 for controlling each unit are provided.
- the information processing device 50 controls at least the wind direction, the air volume, and the temperature of the blower unit 52 from the control unit 57.
- the analysis unit 54, the determination unit 56, and the control unit 57 have the functions of the CPU 30 of the hardware 100 shown in FIG. 1 as a whole, and the storage unit 55 has the functions of the ROM 31 and the RAM 32 of the hardware 100.
- the information processing device 50 uses the output signal of the infrared sensor 51 as an input signal, and based on the learning process of learning the position and temperature change of the living body from the input signal and the information obtained in the learning process, based on the input signal. It is provided with an inference process for inferring, and a control operation is performed based on the inference process.
- the NN has a process of inferring the position of the living body and the temperature change of the living body from the input signal from the infrared sensor 51.
- the position of the living body is determined by detecting the temperature of the living body using the information processing device 50.
- the information processing device 50 performs a process of predicting the temperature suitable for each living body from the temperature change of the living body.
- the NN Before using the air conditioner, the NN is trained and the weight matrix W, which is the learning result of the NN, is stored in the storage unit 55, or a part of the weight matrix W of the NN is stored in the air conditioner.
- the information processing device 50 includes only the inference process among the learning process and the inference process, calls the weight matrix W in the storage unit 55 at the time of inference, and receives an input signal from the infrared sensor 51. And calculate.
- the NN training according to the above-described first to tenth embodiments is performed.
- the correct label is, for example, a mechanism for grasping the structure of the room, grasping and controlling the distance between each living body and the air conditioner, or the appropriate temperature and air volume for each living body.
- the structure of the room for example, those that do not change in temperature are recognized as rooms, furniture, etc., and those that change in temperature are recognized as doors, living organisms, etc.
- the appropriate temperature and air volume for each living body for example, when a mechanism that can grasp the position of the controller (for example, a member that absorbs or reflects infrared rays at the tip of the controller) is provided and each living body operates the controller of the air conditioner.
- Information is used as the correct answer label.
- a process of moving a different living body and extracting and recognizing features such as body temperature and contour is performed by NN.
- the NN may be determined whether the object is a living body or an object such as a cooker from the temperature change when cold air or hot air is applied to the object 58. Further, the NN may determine whether or not to blow the wind on each living body by identifying each living body, or may make a judgment such as not blowing the wind on the cooker.
- the learned NN weight matrix W is stored in the storage unit 55, and is read from the storage unit 55 and used at the next startup of the air conditioner.
- normalization is performed to set the maximum value of the image data to 1 and the minimum value is 0, preprocessing such as standardization to set the average of the image data to 0 and the variance to 1 is performed, and only at startup. You may use online learning, in which batch learning is performed, or learning is performed at any time depending on the situation even during startup.
- blower portions 52 of the air conditioner is one in FIG. 21, a plurality of blower portions 52 may be provided.
- two blower units 52 capable of outputting different temperatures and air volumes are provided, and a mechanism for sending temperatures and air volumes according to each living body is provided according to the NN processing result. Further, the temperature can be adjusted by appropriately mixing the air in the room by the two blowers 52.
- it is possible to identify each living body by learning the NN it is possible to construct an air conditioner having various functions by combining the information. Further, even when a plurality of living organisms are present at the same time, each blowing unit 52 can fix the wings for adjusting the wind direction, and can continuously send the wind having an appropriate temperature and air volume for each living body.
- an infrared sensor, a distance sensor, or the like is used to recognize an article, a person, the opening / closing of a door, etc. in the room.
- the NN is an input signal of a sensor signal such as an infrared sensor, and an output signal of an article, a person, opening / closing of a door, etc. in the room.
- the amount of information (number of pixels) to be acquired has increased, and a response in real time is required. For this reason, it is becoming indispensable to increase the size and power of the hardware for processing information accordingly.
- the air conditioner according to this embodiment it is possible to suppress the increase in size and power consumption and speed up the processing. Therefore, the power consumption can be reduced, an additional structure for heat dissipation is not required, the indoor unit can be miniaturized, and the efficiency of the indoor unit can be improved.
- the calculation is performed through an information network such as WWW and a computer that is good at large-scale calculation such as a computer server, and the result of the computer is air-conditioned through WWW. Work was being done to return it to the machine.
- the air conditioner according to this embodiment does not need to use an information network such as the WWW, can reduce the power and time lag required for their communication, and provides the user with a more comfortable air conditioner system. Can be done.
- FIG. 22 is a diagram showing a configuration of an electronic device constituting a servo system as an electronic device according to another example of the eleventh embodiment.
- the electronic device includes an information processing device 50 according to the present application, a sensor 51a for detecting electromagnetic waves, and an operating unit 52a.
- the information processing device 50 includes an input unit 53 for inputting data, an analysis unit 54 for analyzing the data input from the input unit 53, a storage unit 55 for recording the analysis result, and a determination unit for determining the analysis result. 56 and a control unit 57 for controlling each unit are provided.
- the information processing device 50 controls at least one of stopping the operation of the operating unit 52a and eliminating abnormal objects from the control unit 57.
- the analysis unit 54, the determination unit 56, and the control unit 57 have the functions of the CPU 30 of the hardware 100 shown in FIG. 1 as a whole, and the storage unit 55 has the functions of the ROM 31 and the RAM 32 of the hardware 100. ..
- the information processing device 50 uses the output signal of the sensor 51a as an input signal, and learns at least one of the amount of change in the position of the object, the electric field, the magnetic field, and the temperature from the input signal, and obtains it in the learning process. It is provided with an inference process that makes an inference based on an input signal based on the information obtained, and performs a control operation based on the inference process.
- the NN may be learned in advance, and the information processing device 50 includes at least an inference process among the learning process and the inference process, and calls the weight matrix W in the storage unit 55 at the time of inference. Use.
- the position of an article being produced using the servo system is monitored by calculation using NN, and the characters, colors, barcodes, and defects written on the article are defective. Analyze the presence or absence.
- a signal read by a CCD such as a camera, a CMOS image sensor, a near-field antenna, or a far-field antenna is input to the NN. Since the servo system operates at high speed, instant judgment is required, and CNN has been used in the past, but the search range has been reduced to reduce the size of the image in order to increase the speed.
- the same processing speed can be maintained for a large image, so that a wide range of information can be processed with a smaller number of sensors, and an NN having a larger number of layers can be used.
- the accuracy can be improved.
- the power and time lag required for their communication can be reduced, and a servo system that responds more quickly can be provided to the user. .. Since the servo system is required to control at the same time by linking multiple servo motors, real-time performance is emphasized.
- the quick response can quickly avoid manufacturing and operation in the event of an abnormality. Therefore, waste as waste can be reduced, and failures due to collisions between devices due to the devices operating in an abnormal posture can be reduced.
- the self-supporting robot includes at least one of a CCD such as a camera, a CMOS image sensor, a near-field antenna, and a far-field antenna as a sensor 51a, monitors the position of an article, and has characters, colors, and bars written on the article. At least one of the codes can be identified and determined.
- the sensor 51a is attached directly to or around the self-supporting robot.
- the information processing device 50 learns from the input signal input from the sensor 51a whether or not there is a character, a color, a bar code, or a defect to which the noise of the sensor 51a itself or the noise depending on the usage environment of the sensor 51a is applied. It includes a learning process and an inference process that makes inferences based on input signals based on the information obtained in the learning process, and performs control operations based on the inference process.
- the information processing device 50 may include at least an inference process among the learning process and the inference process, and in that case, the weight matrix W in the storage unit 55 is called and used at the time of inference.
- the collision prevention device includes an information processing device 50 according to the present application, a sensor 51a, and an operating unit 52a.
- the information processing device 50 includes an input unit 53 for inputting data, an analysis unit 54 for analyzing the data input from the input unit 53, a storage unit 55 for recording the analysis result, and a determination unit for determining the analysis result. 56 and a control unit 57 for controlling each unit are provided.
- the sensor 51a detects external environment information in the vehicle, which is information for control operation, and at least one of a CCD such as a camera, a CMOS image sensor, a radar, and an ultrasonic sonar is used.
- a CCD such as a camera, a CMOS image sensor, a radar, and an ultrasonic sonar is used.
- a radar a radar using a laser beam, a radar using a millimeter wave, or the like such as a lidar (Light Detection and Ranger) is used.
- a lidar Light Detection and Ranger
- a combination of different sensors or a plurality of the same sensors may be used, and safety during driving can be enhanced.
- the information processing device 50 obtains a learning process that learns at least one of the position of a living body outside the vehicle, the position and traveling direction of surrounding vehicles, signal information, and lane information from an input signal input from the sensor 51a, and a learning process. It is equipped with an inference process that makes inferences based on input signals based on the information received, and performs control operations based on the inference process. In this case as well, at least an inference process may be provided.
- This collision prevention device can control the steering, accelerator, and brake, but the output from the control unit 57 of the information processing device 50 is roughly classified into one that gives a warning to the driver in the event of an abnormality, or a warning to the driver.
- an abnormality when a vehicle detects a deviation from the lane, when the vehicle is traveling, when the distance between the vehicle and the preceding vehicle is approaching, there is a person or an object in the direction in which the vehicle travels (front and back, left and right), and a collision occurs. In some cases, such as, a warning is given to the driver by sound, light or vibration to inform the driver of the abnormality.
- the operation unit 52a directly controls the vehicle by a command from the control unit 57 of the information processing device 50.
- the information processing device 50 electrically controls the steering to avoid deviation from the lane, drives the accelerator or the brake so as to control the distance to the preceding vehicle or the front, rear, left and right objects. ..
- the surrounding conditions of the vehicle are grasped, and control such as stopping the vehicle at a safe position is performed.
- the collision prevention device is provided with only the sensor 51a for detecting the environment information outside the vehicle, but further, a sensor such as a camera using visible light or infrared light is mounted in the vehicle to judge the physical condition of the driver and detect an abnormality. You may. Also in this case, the output signal of the in-vehicle sensor is used as an input signal, and the information processing device 50 performs a calculation to determine the driver's physical condition, such as a sleeping state and a drinking state. Based on that judgment, the driver is warned by sound, light, or vibration, or the situation around the car is grasped and the car is stopped at a safe position.
- a sensor such as a camera using visible light or infrared light
- an in-vehicle device capable of identifying and authenticating a driver and preventing vehicle theft, driving by an unlicensed driver, etc. is shown below.
- the configuration in this case is also the same as in the case shown in FIG.
- the sensor 51a detects in-vehicle environment information that is mounted in the vehicle and serves as information for control operation, and at least one of a CCD or CMOS image sensor such as a camera is used.
- the information processing device 50 includes a learning process that learns the driver's face and at least the face of the driver from the input signal input from the sensor 51a, and an inference process that makes inferences based on the input signal based on the information obtained in the learning process. And perform control operations based on the inference process.
- the information processing device 50 includes both a learning process and an inference process, can learn a specific driver, identify and authenticate the driver, and only that driver can operate the vehicle.
- the information is updated regularly using online learning, in which the data is added and learned. ..
- it is not necessary to use an information network such as WWW, and the driver can be quickly identified and authenticated with an energy-saving, simple and inexpensive configuration regardless of the connection environment with them. You can do things.
- a surveillance camera can be applied as an electronic device provided with the information processing device 50 according to the present application.
- a visible light image from a camera serving as a sensor 51a is input to the information processing device 50, and the information processing device 50, for example, inputs an ID assigned to each individual (in Japan, a my number, a passport passport number, etc.). Output.
- the behavior of each individual can be monitored in real time by a surveillance camera. In this embodiment, it is not necessary to upload the image of the surveillance camera via an information network such as WWW, and the amount of information can be reduced by performing distributed processing by the information processing device 50.
- the amount of information of 1 MB / s is handled per surveillance camera, but in this embodiment, the amount of information can be significantly reduced by processing the amount of information of 1 kB / s. This makes it easy and reliable to monitor personal behavior.
- In-vehicle devices that identify and authenticate drivers and electronic devices such as surveillance cameras that monitor personal behavior use input signals read by a CCD or CMOS image sensor and use human behavior, state, etc. as output signals. ..
- an image is used as input information, it is possible to increase the speed and accuracy by increasing the number of layers of NN.
- the electronic device using the NN to which the tenth embodiment is applied can effectively improve the accuracy by calculating both the spatial domain and the spatial frequency domain.
- FIGS. 21 and 22 show electronic devices provided with sensors 51 and 51a
- the information processing device 50 according to the present application can also be applied to electronic devices not provided with sensors 51 and 51a.
- the signal generated by the calculation in the information processing apparatus 50 is used as the input signal.
- the inference process includes at least the inference process among the learning process in which learning is performed based on the input signal and the inference process in which inference is performed based on the input signal based on the information obtained in the learning process. The control operation is performed based on the above.
- At least one of characters, colors, light intensities, dimensions, shapes, positions, angles, speeds, or changes in the position of an object due to acceleration, electric fields, magnetic fields, and temperatures is inferred.
- the information processing device according to the present application can be applied to electronic devices.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Human Computer Interaction (AREA)
- Medical Informatics (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Social Psychology (AREA)
- Psychiatry (AREA)
- Discrete Mathematics (AREA)
- Algebra (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Image Analysis (AREA)
Abstract
Description
また、従来のCNNにおける各層の演算は、脳の構造を数学モデルに落とし込んで模倣したものであるが、物理的な意味の把握が困難であった。このため、CNN全体の演算結果のみに基づいて、構築したCNN全体を評価する事は可能であるが、各演算における入力信号と出力信号に基づいて、各演算の物理的な機能を解析できずCNNの詳細解析ができないと言う問題点があった。
さらに、この情報処理装置を用いた学習および推論に基づいて高速高精度な制御動作を行う電子機器を提供する事を目的とする。
また、本願に開示される電子機器によれば、高速高精度な制御動作が可能になる。
<ハードウェアの構成例>
図1は、本願の実施の形態1によるニューラルネットワーク(以下、NNと称す)として機能する情報処理装置としてのハードウェア100の全体構成を示す図である。
ハードウェア100は、スタンドアロンのコンピュータであっても良いし、クラウド等を利用したサーバクライアン卜システムのサーバ、またはクライアン卜であっても良い。さらにハードウェア100は、スマートフォンまたはマイコンであっても良い。また工場内などを想定した場合には、エッジコンピューティングと呼ばれる工場内で閉じたネットワーク内での計算機環境であっても良い。
入力部37は、キーボード、マウス、マイクあるいはカメラ等で構成される。また、出力部36は、LCD(Liquid Crystal Display)あるいはスピーカ等で構成される。また、CPU30が実行するプログラムは、ハードウェア100に内蔵されている記録媒体としてのハードディスク33またはROM31に予め記録しておくことができる。あるいは、プログラムは、ドライブ39を介して接続されるリムーバブル記録媒体40に格納(記録)しておくことができる。
また、プログラムは、複数のハードウェア間を有線、無線のいずれか一方あるいは、双方を介して接続するWWW(World Wide Web)等のシステム(Com port)を通して送受信することができる。さらに、後述する訓練を行い、訓練によって得られた重み関数のみを、上記方法で送受信することができる。
なお、NNの各層は、CPUあるいはGPU(Graphics Processing Unit)等の並列演算を得意とする汎用のハードウェアで構成する他、FPGA(Field-Programmable Gate Array)、FFT(Fast Fourier Transform)演算アーキテクチャ、あるいは専用のハードウェアで構成することができる。
また、音波または可視光を含む電磁波、熱、あるいは振動等の物理現象を数値データに変換する目的でのセンサ、またはハードウェア100内で設計したCAD等の画像または計算結果を出力する機構を、ハードウェア100が備えていても良い。あるいは、上記センサの情報とハードウェア100の計算結果とを融合する機構を、ハードウェア100が備えていても良い。さらに、ハードウェア100は、電源線または内部バッテリを元に駆動する機構を備えている。
NNで使用するデータは、教師あり学習、教師なし学習、あるいは強化学習によるものが用いられる。NNは深層学習(ディープラーニング)あるいはパーセプトロンとも呼ばれることがある。パーセプトロンに関しては、後述する隠れ層が1層の場合は単一パーセプトロン、2層以上の場合は多層パーセプトロンと呼ばれる。この多層パーセプトロンがNNと呼ばれる。
一方、教師なし学習は、学習データに正解ラベルを付けずに学習する方法である。自己符号化器を多層化したStacked Auto Encoder(SAE)、制約ボルツマンマシンを多層化したDeep Boltzmann Machine(DBM)などが知られている。
さらに強化学習は、教師あり学習および教師なし学習とは異なり、正解を与える代わりに時々刻々と変わるデータに対して、将来得られる期待値を最大化するDQN(Deep Q Learning)が用いられる場合が多い。
その他にも、一部の学習データに正解ラベルが与えられていない半教師あり学習、あるいは、学習データを用いて学習させたモデルを別の学習データに適用させる転移学習のデータでも、従来のCNNが適用できるものであれば、この実施の形態にも適用できる。
この実施の形態で説明するNNは、CNNの代替手段となるものであるため、この実施の形態によるNNを説明するに先だって、第1比較例となるCNNについて以下に説明する。CNNは深層学習で最も代表的な技術であって、推論時に高い精度を持つ手法であり、ネオコグニトロンと呼ばれるパターン認識を応用して作られた技術である。
図2は、第1比較例としてのCNNの構成例を示す図である。図2に示すように、CNNは、入力層1から順に、畳み込み層2、活性化関数3、畳み込み層2、活性化関数3、プーリング層4、畳み込み層2、活性化関数3、全結合層5、活性化関数3、全結合層5、出力層6、出力7を備えて構成される。
従来例においては、150層を超える隠れ層を有するCNNも知られており、近年、高精度化のためには層数を増やす傾向にある。
後述する高速フーリエ変換および高速逆フーリエ変換においては、行列の大きさは2のべき乗である必要があるため、入力する行列と、出力する行列との大きさをパディングによって等しくしておくのが望ましい。
簡単な例として、仮に、定数cを用いた線形関数h(x)=cx、を活性化関数と考える。この線形関数h(x)を3層重ねる関数(xからyへの関数)は、y=h(h(h(x)))となる。しかし、この関数は、y=c・c・c・x、即ち、y=c3x、とも記載でき、これは1層でも表現できる。これは、出力yは入力xに対して線形の関数しか表現できないことを意味しており、隠れ層を増やす意味がない。そのため、活性化関数には非線形関数を用いる必要がある。
活性化関数3となる非線形関数には、Relu関数(正規化線形関数)である
g(x)=max(x,0)
シグモイド関数(またはロジスティックシグモイド関数とも呼ばれる)である
g(x)=1/(1+exp(-x))
あるいは、双極線正接関数である
g(x)=tanh(x)
が用いられる。
g(x)=ax for x<0,g(x)=x for x>=0
あるいは、Thresholded ReLU
g(x)=x for x>θ,g(x)=0 otherwise
など、様々な活性化関数が知られている。
例外として出力層6の直前に、線形関数である恒等関数g(x)=x、を用いることもある。但し、任意の曲線を表現するためには、非線形関数を活性化関数に用いた、2層以上の隠れ層が必要となるため、ここでは恒等関数に関しては説明しない。
通常の使用においては、学習プロセスはGPUあるいは専用ハードウェアを搭載したサーバクライアン卜システムで行う。
この実施の形態では、音波または可視光を含む電磁波、熱、あるいは振動等の物理現象を数値データに変換するセンサの出力信号、またはハードウェア100内で演算により設計した信号、またはセンサ信号と演算結果の両方を含む信号を処理するハードウェア100(情報処理装置)の構成例を示す。ハードウェア100内での演算は、NNを用いた演算処理である。入力信号は1次元以上であればいずれでも構わないが、この実施の形態では2次元の画像を対象に説明する。
また、空間周波数領域において、複素畳み込み演算による演算後の信号に対して、非線形関数となる複素活性化関数をかける。このように、複素畳み込み演算と、複素活性化関数を用いる演算とを、連続する空間周波数領域での処理の中で行う。これにより、演算回数を大幅に削減できると言う上述した効果を、阻害する事なく確実に実現できる。
この実施の形態では、複素畳み込み演算後に複素活性化関数を用いるため、隠れ層におけるフーリエ変換と逆フーリエ変換とをそれぞれ1度実施すれば、その間に連続する空間周波数領域での処理の中で、複素畳み込み演算と複素活性化関数を用いる演算とを、何度でも行う事ができる。これにより大幅な計算コスト低減が達成できる。
このNNは、2次元画像を入力信号20とし、入力層11は、入力信号20をNNに入力する。フーリエ変換層12は、空間領域の入力信号20をフーリエ変換し、空間周波数領域の信号である第1振幅信号21rと第1位相信号21θとを出力する。この場合、高速フーリエ変換を用いる。
複素活性化層14は、空間周波数領域における複素活性化関数fを用いて、第2振幅信号22rおよび第2位相信号22θの内、少なくとも第2振幅信号22rを更新する。この実施の形態では、第2位相信号22θを用いて第2振幅信号22rのみを更新して第3振幅信号23rとして出力すると共に、第2位相信号22θは更新せずに出力する。
そして、出力層16は、逆フーリエ変換層15からの信号25を所望の形に変換し、出力7を得てNNから出力する。
また、第1重み行列W1の行の数は、第1振幅信号21rである振幅行列の列の数と同じであり、第2重み行列W2の行の数は、第1位相信号21θである位相行列の列の数と同じである。また、第1重み行列W1および第2重み行列W2の列の数には制約が無い。
また、この実施の形態では、高速フーリエ変換と高速逆フーリエ変換とを用いるため、振幅行列および位相行列は、2のべき乗の行列である必要がある。このため、振幅結合層13Aおよび位相結合層13Bでは、振幅行列、位相信号と同じ大きさの行列を出力する第1重み行列W1と第2重み行列W2とが用いられる。
入力データには、音波または可視光を含む電磁波を受信するセンサ信号、熱または振動を取得するセンサ信号、あるいはハードウェア100内で演算されて出力された信号、あるいはセンサ信号と演算結果との両方を融合した信号を用いる。音波に関しては、マイクまたは超音波センサで受信した信号を用いる。電磁波を収集するセンサは、可視光を収集するカメラ、赤外線または紫外線を収集するカメラ、光量センサ、近傍界アンテナ、遠方界アンテナ、磁気センサ、電界/磁界センサ、電流センサ、電圧センサ、あるいは放射線センサで受信した信号を用いる。その他にも加速度センサ、温度センサ、湿度センサ、ガスセンサ、距離センサ、圧力センサ、加速度センサ、あるいはジャイロ等の振動センサを用いても良い。
また、入力データとなるデータは単体である必要はなく、2つ以上のデータを組合せて用いても良い。その際は、この実施の形態で示す複素活性化関数を用いたNNと、従来のパーセプトロンとを組合せて学習することにより、所望の学習を行うことができる。
例えば、画像の入力信号を10通りに分類するハードウェア100においては、学習時において入力画像と正解ラベルが1対1で対応する形で関連付けられたデータを教師データとして学習させる。学習によって得られたパラメータ(NNの層を構成する重み行列の各要素)に対して、カメラによって撮影された信号と学習済みのパラメータを演算させ、分類に対する出力信号を得る。
また、上記説明では、教師あり学習について説明したが、正解ラベルがない教師なし学習においても同様である。
この自己符号化器においても、上述した同様にカメラからの撮影画像を入力信号とする。この際、正解ラベルは不要であり、カメラの出力を自己符号化器に入力していくことで学習を行う。
また、センサ信号と演算結果との両方を融合した入力信号については、センサ信号をシミュレータ等に入力して得られた信号を入力信号とする。また、シミュレータの出力に基づいて、センサの種類あるいは位置情報を適宜変更して得られた信号を入力信号としても良い。
複数のチャネルを入力とする場合においては、一般的にはカーネルを用いた畳み込み演算によって4チャネルを1チャネルに変換する。この実施の形態では、フーリエ変換層12の前段に畳み込み層を1層設ける方法、各チャネルに対してフーリエ変換を行い、全結合層によって1チャネルに変換する方法、あるいは単に各チャネルに事前に重み付けを行い、入力層11に入れる入力信号20を1チャネルにする方法を用いる事ができる。
この実施の形態では、NNの性能評価に一般的に用いられているMNIST(Mixed National Institute of Standards and Technology database)を用いた学習結果を示す。MNISTは、縦×横が32×32のグレースケールの画像であって、60000の学習用データと、学習には使わない10000のテスト用のデータとを有する。
フーリエ変換を行うフーリエ変換層12について、以下に説明する。なお、逆フーリエ変換は、フーリエ変換の逆変換であるため、逆フーリエ変換および逆フーリエ変換層15についての詳細は省略する。
フーリエ変換においては、その性質上、フーリエ変換に入力した2次元画像が縦横に無限に接続された2次元平面になっていることを前提に変換を行う。
入力画像を縦横に直接接続した場合は、画像の端部の画像が接続される線上で不連続になり、本来の入力画像が持っていない周波数成分が発生することがある。そのため、通常のフーリエ変換では画像の縦横にそれぞれ窓関数をかけ、端部を0に近づけた信号をフーリエ変換する。
入力画像の横軸をx軸、縦軸をy軸とすると、入力画像のx軸成分の端部に、y軸と平行な境界線lyと線対称な画像を配置すると共に、入力画像のy軸成分の端部に、x軸と平行な境界線lxと線対称な画像を配置する。さらに、入力画像に対して対角の位置には、2つの境界線lx、lyの交点に対して点対称、即ち、180度回転した回転対称な画像を配置する。
この実施の形態では、フーリエ変換に高速フーリエ変換を用いることを前提としているため、入力画像の縦横の大きさは2のべき乗、すなわち偶数になる。1つの画像の縦横、対角に対称な画像を配置する上記手法では、4枚の画像で構成した画像の縦横も偶数になると共に、2のべき乗の大きさを持つため、高速フーリエ変換を用いることができる。
また、逆フーリエ変換においても、逆フーリエ変換する前に、同様の処理をしても良い。
上述したように、空間信号の畳み込み演算は、空間周波数信号に対しては行列の掛け算となる。この掛け算は具体的にはアダマール積であり、以下の数式で表される。但し、Fをフーリエ変換、*を畳み込み演算、◎をアダマール積とする。
F[k*u]=F[k]◎F[u]
nを2次元画像の大きさとすると、通常のフーリエ変換の計算オーダはO(n3)で、畳み込み演算の計算オーダも、同様にO(n3)である。なお、O( )は計算の回数の概算値を示すものである。
高速フーリエ変換の計算オーダはO(n2・log2n)であり、高速フーリエ変換と高速逆フーリエ変換(IFFT)とを合わせた計算オーダはO(2n2・log2n)となる。また、結合層13での演算である上述した複素畳み込み演算の計算オーダはO(n2)であり、他の計算オーダO(n3)、O(n2・log2n)に比べて充分小さく無視できる。
上述したように、1つのCNNにおいて、通常、畳み込み層2は複数あり、畳み込み層2の数をmとすると、CNNの計算オーダは、O(m・n3)となる。また、上述した第2比較例の場合では、畳み込み演算の代わりに複素畳み込み演算を行うが、複素畳み込み演算毎に、その前後にフーリエ変換と逆フーリエ変換とを繰り返し用いるため、高速フーリエ変換と高速逆フーリエ変換とを用いると、計算オーダは、O(2m・n2・log2n)となる。なお、第2比較例における複素畳み込み演算の回数は、CNN内の畳み込み層2の数αと等しい。
このように、この実施の形態では、畳み込み演算に対応する複素畳み込み演算の回数が多いほど、また、画像が大きくなるにつれて計算量低減の効果が大きくなる。
このように、この実施の形態によるNNによると、計算量を低減して高速に演算できると共に、演算の信頼性向上も図れる。
但し、手法によっては結果に差が生じることがある。それは、フーリエ変換の入力と出力との関係において、空間領域が持つ全エネルギ(体積)と空間周波数領域が持つ全エネルギとが異なるためである。即ち、フーリエ変換した後、逆フーリエ変換すると入力前の信号と出力された信号との間に差異が生じることがある。学習と推論を、同じ隠れ層を持つNNで行う場合は、ほぼ問題にならないが、データよってはパーセバルの等式(またはレイリーのエネルギ定理)に従い、空間領域が持つ全エネルギと空間周波数領域が持つ全エネルギとを等しくする演算を行っても良い。また、学習と推論とで用いるハードウェア100が異なる場合、あるいは丸め誤差が問題となる場合に、パーセバルの等式を用いると良い。これは、逆フーリエ変換においても同様である。
空間領域における畳み込み層2の代わりに、この実施の形態では、空間周波数領域において結合層13を用いる。結合層13は、全結合層または疎結合層を用いる。特に、入力層11に近い層に関しては、全結合層、上位層の出力層16に近い層においては、疎結合層を用いることで、過学習を防止することができる。なお、全結合層を構成する行列においては、すべての重み行列の要素を更新するが、疎結合層を構成する行列においては、確率的に更新しない要素を持つ。
u=Wx+b
で表現される演算を行う。
特に出力層16に近い箇所においては、バイアスベクトルをゼロベクトルとしても良い。Wおよびbの初期値は、通常、疑似乱数値を用いる。また、Xavierの初期値あるいはHeの初期値と呼ばれる行列を、初期値に用いても良く、学習が速く進むことが知られている。これについては、空間信号の場合と同様であり、説明を省略する。
即ち、振幅結合層13Aでは、第1振幅信号21rに第1重み行列W1を掛けて第2振幅信号22rを出力する。また、位相結合層13Bでは、第1位相信号21θに第2重み行列W2を掛けて第2位相信号22θを出力する。
そして、振幅結合層13Aおよび位相結合層13Bでは、第1重み行列W1と第2重み行列W2とのそれぞれに対して、誤差逆伝搬法によって、入力と出力との関係が密接になるように行列内の値を更新する。即ち、第1重み行列W1と第2重み行列W2とは、訓練によって行列内の値が更新される。
第1振幅信号21rとなる振幅行列(入力行列)xに対して、第1重み行列W1は、正の実数のみを持つものとする。振幅行列xは正の実数であるため、(W1)xの行列の要素に対して絶対値|(W1)x|に変換しても良いが、この場合、第1重み行列W1に、正の実数のみとする制約を設けて学習させる。
制約を設けることで学習時の探索範囲を小さくでき、演算回数の低減が図れる。また、各要素に対して絶対値変換の演算を不要にして学習の高速化が図れる。
なお、180/πをラジアンにかけることで算出される角度の単位としての度を、位相に用いても良く、この場合は360°の剰余演算を行う。
複素活性化層14には、振幅結合層13Aおよび位相結合層13Bから出力される第2振幅信号22rと第2位相信号22θとが入力され、これらの信号に対して空間周波数領域における活性化関数である複素活性化関数fを用いて演算する。この場合、複素活性化関数fを用いた演算により、第2振幅信号22rは更新されて第3振幅信号23として出力され、第2位相信号22θはそのまま出力される。
複素活性化層14では、第2位相信号22θを構成する位相行列内の各点iにおける位相θ(i)に対する複素活性化関数fの応答によって、第2振幅信号22rを構成する振幅行列内の前記点iと同位置の点における振幅r(i)の値を更新する。
複素活性化関数fは、空間領域での活性化関数と同様、非線形関数を用いる。非線形関数とは、kを任意の定数とし、x,yを任意の変数とするとき、以下の線形関数gの定義
g(x+y)=g(x)+g(y)
g(k・x)=k・g(x)
のどちらか一方、または双方を満たさない関数として定義できる。
この実施の形態における複素活性化関数fは、この非線形関数を、フーリエ変換後の空間周波数信号に対して用いる為の関数である。
F[g(x)]≠F[g]◎F[x]
即ち、空間領域での値xに対して活性化関数gを施したものをフーリエ変換する結果と、活性化関数gをフーリエ変換すると共に、値xをフーリエ変換し、両者をかけ合わせたものは異なる。
例えば、Relu関数をフーリエ変換すると、Relu関数はx≧0の領域で単調増加であるからフーリエ変換は発散する。そのため、Relu関数をフーリエ変換したものは、空間周波数領域での活性化関数にはならない。
空間領域でのRelu関数は、入力値が正または0の場合は入力値と同じ値を、入力値が負の場合は0を、演算する。複素Relu関数は、振幅r、または位相θの一方のみで決まるものではなく、三角関数を位相成分に施す関数を用いて、振幅成分を更新する。この場合、三角関数を位相成分に施し、さらに振幅成分を乗じた関数を用いて、実軸成分、虚軸成分のいずれか一方、例えば実軸成分が正または0の場合に、振幅成分を更新せず同じ値とし、負の場合に、その関数による演算値に更新する。
複素Relu関数を用いた複素活性化関数fの例を以下の式(5)に示す。
図7は、上記式(5)を図示したものである。図7に示すように、半径rの円に対して、実軸u、虚軸jvとすると、u成分が負の場合に、u成分をu=0、jv成分を|jv|に変換する。このように、実軸uの成分が負の場合に、振幅rをrsinθの絶対値で置き換える事と同値である。
さらに、複素Relu関数を用いた複素活性化関数fの別例を以下の式(6)に示す。
図8は、上記式(6)を図示したものである。この場合、虚軸jv成分が負の場合に、振幅rをrcosθの絶対値で置き換える事と同値である。これは、上記式(5)のθに対して(θ+(π/2))を代入したものと同値であるが、プログラム作成の容易性、比較回数の少なさ、すなわち計算の高速化の点で上記式(5)で示す複素Relu関数による複素活性化関数fよりも勝っている。また、θの比較回数も低減できるため、計算量を削減できる。
さらに、複素活性化関数fを以下の式(7)で定義しても良い。なお、kをlより大きい実数とする。
従来、空間領域において、活性化関数に不連続関数であるステップ関数を用いる事もあった。ステップ関数は以下の式(8)で表される。あるいは、x=0での特異性を考慮して式(9)のように表現することもある。
連続性のない複素Relu関数を用いた複素活性化関数fは、例えば、以下の式(10)で表される。
図9は、上記式(10)を図示したものである。図9に示すように、半径rの円に対して、虚軸jvの成分が負の場合に、jv=0である実軸uに写像したものである。
さらに、連続性のない複素活性化関数fの別例を以下の式(11)に示す。
図10は、上記式(11)を図示したものである。図10に示すように、半径rの円に対して、実軸uの成分が負の場合に、u=0である虚軸jvに写像したものである。
NNの重み行列W(第1重み行列W1、第2重み行列W2)を更新する際には、NNの出力と教師データとの差である損失Lを最小にするために、重み行列Wの各要素の最適値を検索する。誤差逆伝搬法は、最適値を探すための手段であり、勾配降下法を基本とするものである。
勾配降下法においては、重み行列W、学習係数α、および、推論結果と正解ラベルの差の成分である損失Lを用いて、以下の式(12)に基づいて、重み行列Wを更新していく。
この重み行列Wの更新を行う演算が学習であり、学習させるプロセスが訓練である。訓練が完了するとNNの学習が完了したことになる。なお、訓練に用いるデータを訓練データと呼ぶが、学習データと同じ意味で用いる。
訓練データを用いて訓練している際に、学習前に定めておいた所望の性能を満足した時点で学習をとめる、早期終了を用いても良く、これは過学習防止および学習時間の短縮につながる。この点については、この実施の形態と空間領域で行われるCNNの技術との差異はない。
勾配降下法とは、目的関数を最小化(一般的には最適化)する際に用いる解の探索に用いるアルゴリズムである。特に、NNでは、上記式(12)で示した確率的勾配降下法が用いられ、これは最小化する目的関数が微分可能である際に一般的に用いる方法である。このとき、学習係数αは勾配降下法にとって重要なパラメータとなり、AdaGrad法、Adam法、モメンタム法など様々な方法が知られている。これらの方法は、空間周波数領域での学習においても、空間領域での学習と同様であり、詳細な手法については省略する。
一方、ドロップアウトは、ドロップアウト層において振幅行列の成分を確率的に0にすることで、過学習を防止できる。通常は、確率として20%~50%を用いる。
さらに全結合層の代わりに疎結合層を用いる方法、あるいは早期終了、計算結果等の有効数字を下げることを隠れ層に持たせることにより、過学習を防止しても構わない。
順伝搬は、学習済みの重み行列を用いて結果を推定する推論の際に用いられる。また、訓練によって重み行列を更新する際には、順伝搬と逆伝搬とを複数回行う。
順伝搬は、隠れ層の行列あるいは関数を随時入力データに施す。一方、逆伝搬は、順伝搬によって得られた推論値と、正解ラベルの差を誤差とする誤差情報を直後の上位層から直前の下位層に逆伝搬させる。
この実施の形態で用いる複素活性化関数fについても、微分が必要となる。
複素活性化関数fが、上記式(5)で示す場合、即ち、以下の複素Relu関数に対して、
また、複素活性化関数fが、上記式(11)で示す場合、即ち、以下の複素Relu関数に対して、
誤差逆伝搬法において、∂L/∂Wを算出する際、∂L/∂Z、∂L/∂Y、∂L/∂X、∂L/∂Wを算出することになるが、このとき、それぞれの行列の大きさは、Z、Y、X、Wと等しくなる。∂L/∂Z、∂L/∂Y、∂L/∂X、∂L/∂Wがそれぞれ一意に決まるので、これを算出するために、一般的な偏微分の演算である連鎖律を用いる。
この誤差逆伝搬法は、数式をプログラムに落とし込む際に大変都合が良いため、訓練の際には広く用いられる。
出力層16について、以下に説明する。出力層16では、所望の出力7を得るために信号を変形させる関数である活性化関数が用いられる。この実施の形態では、出力層16で用いる活性化関数を出力活性化関数と呼ぶ。
また教師あり学習においては、入力データと正解ラベルのペアが与えられているため、出力活性化関数の出力結果と教師データとの近さの尺度を測る必要がある。この尺度のことをこの実施の形態では誤差関数と呼ぶ。
回帰分析とは出力に連続値をとる関数を対象に、訓練データを再現するような関数を定める手法である。この場合、NNの出力活性化関数に、その値域が、目標とする関数の値域と一致するものを選ぶ。値域が-1以上1以下の場合には、双極線正接関数y=tanh(x)が適している。値域が-∞から∞の間である場合には、恒等写像y=xを選ぶ場合が多い。また、出力活性化関数の出力結果と正解ラベルとの差に関しては、2乗誤差を用いる。逆伝搬の微分を考慮すると、一般的には2乗誤差に対して1/2をかけたものを誤差関数に用いる。
図11は、結合層13および複素活性化層14を2層ずつ用いた場合のNNの構成例を示す図である。このようなNNを用いた場合の、テストデータに対する精度を、第1比較例、即ち、畳み込み層2を2層有するCNNの場合と共に図12に示す。
図12内の実線が、この実施の形態によるNNを用いた場合であり、点線が、第1比較例のCNNを用いた場合である。またこの場合、この実施の形態によるNNでは、上記式(6)で示した複素活性化関数fを用いた。
図12に示すように、この実施の形態によるNNでは、計算回数が1500回を超えるとテストデータに対して95%程度の精度で推論できる。第1比較例のCNNでは、97%程度の精度であるから、ほぼ同等な性能と言える。
なお、MNISTのデータとして、訓練データが60000、推論時のテストデータが10000とする。
また、訓練データ数を少なくすること、あるいは、計算時間を要するフーリエ変換および逆フーリエ変換について、専用のICを用いることにより、マイコンなどの小さな処理装置でも層数の大きいNNの計算が可能になる。
この実施の形態によると、主に2次元の画像を対象に、第1比較例のCNNに比べて高速に処理することができ、特に、畳み込み層2に相当する結合層13を複数持つNNでは、大きな効果を発揮する。画像認識においては、CMOS(Complementary Metal-Oxide-Semiconductor)などで取得したデータ以外にも、赤外線カメラ、紫外線カメラ、あるいはフェーズドアレイアンテナ等で電磁波を可視化した映像を入力データに用いることができる。
さらに、上述したように、1次元のデータであってもスペクトログラムに変換することで2次元のデータとみなすことができるため、この実施の形態による手法を用いることができる。
また、スペクトログラムに変換することにより、RNN(Recurrent Neural Networks)全般にも、畳み込み演算を要する箇所に、この実施の形態による手法を適用することができる。
GANによれば、入力データは2次元以上の画像であっても構わないため、例えばシミュレーションデータを入力し、所望の設計を行うことも可能である。
自己符号化器は、例えば2次元の画像を入力層に入力すると同じ画像が出力されるように学習させるNNであり、入力層と出力層との間には畳み込み層を含む、様々な演算処理があり、必要な情報が欠落しないように、演算処理が行われる。
教師なし学習においても、畳み込み演算は不可欠な技術であるため、この実施の形態を適用することにより、計算コストを大幅に低減させることができる。
実施の形態2では、上記実施の形態1で用いた複素Relu関数とは異なる複素活性化関数fを用いる。この実施の形態では、フーリエ変換層12において、0以上の実数である振幅成分と、-π以上π未満の実数である位相成分とに分解する。その他の構成は実施の形態1と同様である。
複素活性化層14では、第2位相信号22θを構成する位相行列内の各点iにおける位相θ(i)に対する複素活性化関数fの応答によって、第2振幅信号22rを構成する振幅行列内の前記点iと同位置の点における振幅r(i)の値を更新する。
上記実施の形態1で用いた複素活性化関数fは、位相θ(i)の大きさに応じて異なる応答により振幅r(i)の値を更新するものであったが、この実施の形態2で用いる複素活性化関数fは、位相θ(i)の大きさによらず同じ演算式による一定の応答により振幅r(i)の値を更新する。
複素ロジスティック関数を用いた複素活性化関数fの例を以下の式(21)に示す。
なお、上記式(21)において、((k2-1)/2)は、複素活性化関数fの出力の最大値を1にするための定数であるため、必須ではない。このため、以下の式(22)に示す複素活性化関数fを用いても良い。この場合、複素活性化関数fの出力の最大値は、(2/(k2-1))となる。複素活性化関数fの最小値は0以上の実数となり、振幅成分を0以上の実数とする上記条件を満たす。
実施の形態3では、上記実施の形態1、2とは異なる複素活性化関数fを用いる。その他の構成は、上記実施の形態1と同様である。
この実施の形態3においても、上記実施の形態1と同様に、複素活性化層14には、振幅結合層13Aおよび位相結合層13Bから出力される第2振幅信号22rと第2位相信号22θとが入力され、これらの信号に対して複素活性化関数fを用いて演算することにより、第2振幅信号22rおよび第2位相信号22θを更新して出力する。
なお、第2振幅信号22rおよび第2位相信号22θは、同じ手法で同様に更新するが、第2位相信号22θをそのまま保持し、第2振幅信号22rのみを更新しても良い。
ここでは、簡単のために、結合層13で生成された第2振幅信号22rおよび第2位相信号22θは、それぞれ2次元の行列である振幅行列および位相行列とする。
振幅行列の各軸は周波数軸を示し、各要素は振幅の値を示す。複素活性化層14では、N、MをそれぞれN≧2、M≧1とし、周波数軸の成分、即ち周波数成分が1/N、各要素の振幅が1/Mである、縮小された微小行列Lrを生成する。この微小行列Lrを元の振幅行列に加算する。加算後に生成された行列を複素活性化関数fの出力(更新された振幅行列)とする。
なお、この複素活性化関数fは、空間周波数領域での非線形関数となっている。
なお、プーリング層は、複素活性化層14の後段に設けられて、電気工学におけるローパスフィルタ、より一般にはフィルタになるものである。
また、この実施の形態3による複素活性化関数fの演算は、上記実施の形態1または2で示した複素活性化関数fの演算よりも、演算時間を要するものであるが、情報を劣化させず、より高精度な計算ができる。
シフト演算は10進数にする必要がなく、計算が得意な2ビットの演算になることから、ノイマン型コンピュータを使った演算においては計算コストが小さい。例えば、C言語のようなコンパイル言語においては、計算コストは1/10程度になる。1/2、1/4、1/8など、1/2のべき乗にする演算には右シフトのビット演算を行えばよい。
この場合、第2振幅信号22r、第2位相信号22θを形成する振幅行列、位相行列の各要素を1つずつ間引きする。その結果、各行列の行、及び列のサイズは1/2になり微小行列Lr、Lθとなる。サイズが小さくなった微小行列Lr、Lθに対して、高周波成分に0埋めをし、間引きをする前の振幅行列、位相行列と同じサイズに加工する。加工後の微小行列Lr、Lθを、間引き前の行列振幅行列、位相行列に足し合わせることにより、最も簡単な複素活性化関数fの出力信号を生成することができる。
実施の形態4では、上記実施の形態1~3とは異なる複素活性化関数fを用いる。その他の構成は、上記実施の形態1と同様である。
この実施の形態4においても、上記実施の形態1と同様に、複素活性化層14には、振幅結合層13Aおよび位相結合層13Bから出力される第2振幅信号22rと第2位相信号22θとが入力され、これらの信号に対して複素活性化関数fを用いて演算することにより、第2振幅信号22rおよび第2位相信号22θを更新して出力する。
なお、第2振幅信号22rおよび第2位相信号22θは、同じ手法で同様に更新するが、第2位相信号22θをそのまま保持し、第2振幅信号22rのみを更新しても良い。
F[g◎h]=F[g]*F[h]
即ち、空間領域でのアダマール積は、空間周波数領域での畳み込みとなることを意味している。この時、F[g]を入力信号、F[h]をカーネルと考え、F[h]をカーネルとして入力信号F[g]に畳み込み演算することを、複素活性化関数fの演算として適用する。
このような関数の一例として、sinc関数がある。sinc関数はxを周波数とすると(sin(x)/x)で表され、x=0で最大値を取る。
このF[h]を用いて、第2振幅信号22rに対して畳み込み演算を行うが、第2位相信号22θに対しても同様に畳み込み演算を行ってもよい。
また、カーネルF[h]の原点において、振幅が最大となる以外にも正負の両方の値を持つことが望ましい。即ち、カーネルF[h]の振幅0の軸を横切る関数になることが望ましい。これにより、畳み込み演算後の結果(F[g]*F[h])の振幅において、振幅0となる点が存在し、この振幅0の点がNNにおける情報となる。
その他、以下の式(26)に示すシグモイド関数は空間周波数領域では収束する関数となっている。
F[h](x)=-jπ・csch(πx)
で算出される関数であっても良い。
また、csch(πx)はcosech(x)で表される関数であっても良い。但し、x=0で発散する関数となるため、実際の計算においては有限の値で丸め込む。
しかしながら、連続する空間周波数領域内での空間周波数信号を扱う演算で、CNNの手法を行っていることになり、従来手法であるCNNの詳細解析が可能になる。
空間領域での畳み込み演算では、入力画像の横軸をx軸、縦軸をy軸とする時、カーネルを例えばガウシアンのx軸方向に微分した2次元の信号とすると、このカーネルを入力画像に畳み込むことでy軸方向成分のエッジが強調された画像を出力することができる。同様に、ガウシアンを任意の方向で微分したカーネルを入力画像に畳み込むことで、入力画像のあらゆる方向の画像のエッジを抽出することができる。
従来、空間領域の処理は脳の神経細胞の発火と似た構造を非線形関数で表現できることが、ネオコグニトロンあるいはCNNの研究を通して分かっている。
しかしながら、この実施の形態による、活性化関数の解析では、後述するプーリング(空間周波数領域では複素プーリング)の機能を併せて解析すると、深層学習における活性化関数の役割の中で最も重要な役割は、入力信号とは異なる周波数成分を発生させることにある、という結果を導くことができる。
なお、この空間周波数領域での演算は、空間領域での演算と同じ結果とはならないが、演算結果の差は、複素畳み込み演算における訓練によって調整することが可能である。
実施の形態5は、上記実施の形態1~4によるNNに、複素プーリング層を備えたものである。複素プーリング層18は、空間領域でのプーリング層4に対応する、空間周波数領域での層である。図15は、この実施の形態5によるNNの構成を示す図であり、上記実施の形態1で示した図4に対応する部分詳細図である。
上記実施の形態1と同様に、振幅結合層13Aは、第1重み行列W1を有し、第1重み行列W1を第1振幅信号21rにかけて第2振幅信号22rを出力する。位相結合層13Bは、第2重み行列W2を第1位相信号21θにかけて第2位相信号22θを出力する。
複素活性化層14は、複素活性化関数fを用いて、第2振幅信号22rおよび第2位相信号22θの内、第2振幅信号22rのみを更新して第3振幅信号23rとして出力すると共に、第2位相信号22θは更新せずに出力する。
なお、上記実施の形態3、4に複素プーリング層18を設けた場合で、複素活性化層14が第2振幅信号22rおよび第2位相信号22θの双方を更新している場合には、更新された双方の信号に対して複素プーリング層18が演算処理する。
空間領域におけるプーリングは、畳み込み層2で抽出された特徴の位置の感度を低下させるもので、対象とする特徴量の画像内での位置が変化しても同一の特徴量を持つ画像と認識できるものである。これはすなわち画像の「ぼかし」を意味する。
空間周波数領域に「ぼかし」を適用するには、高周波成分を除去することで容易に得ることができる。高周波成分とは、隣り合うピクセルの要素が急激に変化するときに発生する成分であるため、空間周波数領域で高周波成分を除去することにより複素プーリングが得られる。
バンドパスフィルタの例として、David HubelとTorsten Wieselとによって示された三角関数とガウス関数との積で表される関数であるガボールフィルタを用いる方法がある。また、ローパスフィルタとハイパスフィルタとを組み合わせて、任意のバンドパスフィルタを用いても良い。
例えば、CNNで振幅がマイナスとなる場合、活性化関数(Relu関数)の出力が0となる。出力が0というのは、空間周波数領域では周波数0、すなわち直流成分を意味し、電気工学では半波整流になることを意味する。また、特定の周波数では、半波整流は直流成分から特定の周波数へ連続的に周波数が変化することになるため、活性化関数への入力信号が単一周波数であっても、出力信号は広帯域の周波数成分を持つものになる。
ここで述べる情報の集約とはローパスフィルタ(例えば、ガウシアンフィルタ)によって高周波成分が除去されたことを意味する。
上述したように、空間領域におけるプーリングは大別して最大値プーリングと平均値プーリングがある。例えば2×2のビットで画像を切り取っていき、その2×2のビットの中で最大値、または平均値を出力する演算である。この演算はぼかしの効果がある。また、ぼかすことによって、入力画像の位置がずれたり、回転したりした場合でも同一の画像として認識できる効果を生むことができる。
このため、空間領域でのプーリングは、物理的に曖昧な意味を持つものであったが、空間周波数領域では明示的なフィルタとして複素プーリングを作用させることができる。これにより、複素プーリングは、空間領域でのフィルタよりも、推論における精度の高い深層学習モデルを構築することができる。更に、ローパスフィルタ以外にもDC成分と高周波成分だけを除去するバンドパスフィルタなど、任意のフィルタを構築することができるため、自由度の高い深層学習モデルを構築できる。
実施の形態6は、上記実施の形態1~5によるNNに、複素バッチ正規化層を備えたものである。図16は、この実施の形態6によるNNの構成を示す図であり、上記実施の形態1で示した図4に対応する部分詳細図である。
この場合、結合層13と複素活性化層14との間に複素バッチ正規化層19を設ける例を示すが、これ以外にも、複素活性化層14の後段、あるいは結合層13の前段に設けても良い。
このため、空間領域では、訓練にかかる時間を短縮するバッチ正規化と呼ばれる手法を用いる場合がある。空間領域でのバッチ正規化は、一つの隠れ層の入力(通常は行列)の平均と標準偏差をとり、その入力から平均値を引いたものを、標準偏差で除算した演算を行う。
複素バッチ正規化は、空間周波数領域においても、振幅信号のみに対して空間領域でのバッチ正規化と同様の演算を行うことで、内部共変量シフトの影響を低減でき、訓練にかかる時間を短縮できる。
この場合、複素バッチ正規化層19を、結合層13と複素活性化層14との間に配したため、複素バッチ正規化層19は、振幅結合層13Aが出力する第2振幅信号22rのみを複素バッチ正規化して振幅信号22raを出力する。
また、複素バッチ正規化層19自体の演算時間が影響して学習に時間がかかる場合には、複素バッチ正規化層19の数を減らし、以下の方法で対応するのが望ましい。即ち、複素活性化関数fの変更、重み行列Wの初期値の事前学習、勾配降下法における学習係数を下げる、ドロップアウト層あるいは疎結合層などによりNNの自由度を制約する方法で対応する。
実施の形態7は、上記実施の形態1~6によるNNに、振幅対数化層および逆振幅対数化層を備えたものである。図17は、この実施の形態7によるNNの構成を示す図であり、振幅対数化層および逆振幅対数化層を上記実施の形態6に適用した場合を示し、図16に対応するNNの部分詳細図である。
図17に示すように、フーリエ変換層12と結合層13との間に振幅対数化層10Aを設け、さらに、逆フーリエ変換層15の前段に逆振幅対数化層10Bを設ける。
この実施の形態7では、振幅対数化層10Aが、第1振幅信号21r(振幅行列)の振幅に対して対数を演算し、即ち、振幅を対数化した振幅信号21raを生成して出力する。これにより、フーリエ変換後に発生する、大きな振幅を持つ信号による悪影響を抑制して学習の信頼性を高めることができる。
y=logax
となる。ここで、基数aは自然対数e、2、または10を使うのが一般的であるが、その他の実数でも良い。
通常の画像においては、入力信号の大きさが2~3桁程度異なることもあるが、例えば基数10を使った場合には3桁異なっていても3倍の変化となり、小さな振幅を持つ信号には敏感になり、大きな振幅を持つ信号には鈍感になるように学習させることができる。
y=b・loga(x+δ)
なお、誤差成分δは、振幅行列の要素として0を持たない場合には入力する必要がなく、振幅行列の要素として0を持つ場合は、0を除く最小値よりも一桁以上小さい値を入力するのが望ましい。
実施の形態8は、上記実施の形態1~7によるNNに、軸対数化層および逆軸対数化層を備えたものである。図18は、この実施の形態8によるNNの構成を示す図であり、軸対数化層および逆軸対数化層を上記実施の形態6に適用した場合を示し、図16に対応するNNの部分詳細図である。
図18に示すように、フーリエ変換層12と結合層13との間に軸対数化層10Cを設け、さらに、逆フーリエ変換層15の前段に逆軸対数化層10Dを設ける。
この実施の形態においても、簡単のために、学習に用いる入力データは2次元データとする。2次元データの一方の軸をX軸、他方の軸をY軸と呼ぶ。
フーリエ変換層12でフーリエ変換された後の第1振幅信号21rおよび第1位相信号21θのX軸、Y軸は、入力データと同様に真数である。第1振幅信号21rおよび第1位相信号21θは、軸対数化層10Cに入力され、軸対数化層10Cは、第1振幅信号21rおよび第1位相信号21θのX軸、Y軸の各軸に対して対数を用い、即ち、軸対数化した振幅信号21rb、位相信号21θbを生成して出力する。
なお、基数は0以上の実数であれば良い。
この実施の形態においては、情報量の多い低周波成分の情報を強調し、情報量の少ない高周波成分の情報を抑圧することができ、信頼性の高い学習を効率良く進めることができる。
さらに、スペクトログラムのようにX軸とY軸とで物理量が異なる場合においては、一方の軸のみを軸対数化してもよい。また、同じ物理量においても、X軸とY軸とで学習の観点が異なる場合においては、同様に、一方の軸のみ軸対数化しても構わない。
図19は、この実施の形態9によるNNの構成を示す図である。
図19に示すように、NNは、入力層11Aから順に、振幅結合層13Aおよび位相結合層13Bから成り、複素畳み込み演算を行う結合層13、複素活性化関数fを用いた演算を行う複素活性化層14、複素プーリング層18、逆フーリエ変換層15、出力層16および出力17を備えて構成される。
そして、NNの前処理でフーリエ変換を行う演算部となるフーリエ変換層12AをNNとは別に備え、フーリエ変換層12Aは、入力される空間領域の入力信号20をフーリエ変換し、空間周波数領域の信号である第1振幅信号21rと第1位相信号21θとを出力する。
このため、NNの中でフーリエ変換を行わなくても良く、学習時フーリエ変換層での繰り返し計算することが不要になり、またその逆伝搬の計算も不要となるため、計算時間を短縮することができる。
図20は、この実施の形態10によるNNの構成を示す図である。
図20に示すように、NNは、入力層11から順に、畳み込み演算を行う畳み込み層2、活性化関数3、フーリエ変換層12、振幅結合層13Aおよび位相結合層13Bから成り、複素畳み込み演算を行う結合層13、複素活性化関数fを用いた演算を行う複素活性化層14、複素プーリング層18、逆フーリエ変換層15、出力層16および出力17を備えて構成される。この場合、上記実施の形態5によるNNの入力層11の後段に、畳み込み層2および活性化関数3を挿入した例である。
例えば、RGBを組み合わせて構成される入力信号20に対しては、CNNにおいては入力層で色彩ごとに入力データを分割する方法が用いられる。カラー画像の場合は通常、縦と横の情報の他に、RGBの色彩の次元であるチャンネル方向の3次元形状となる。この3次元形状には、空間的に近い画素間は類似する値が多い、等の空間情報が含まれている。また、RGBの各チャンネルの間にはそれぞれに密接な関係がある、あるいは距離の離れた画素同士はあまり関わりがない等、3次元形状は、画像の本質的な情報を含んでいる場合がある。
畳み込み層2は、これらの情報を抽出して保持することができる。
これにより、3次元以上の多次元の入力データに対しても高速化することができる。
この実施の形態11では、上記各実施の形態1~10にて示した情報処理装置によるNNを用いて制御動作を行う電子機器について説明する。
本願による情報処理装置は、空気調和機に搭載されるセンサ情報の処理、ファクトリーオートメーションにおける工場で使用されるサーボシステム等のセンサ情報の処理、あるいは、車内屋外に取り付けられたセンサ情報の処理等に用いられる。従来、これらの処理は、CNNを用いるために、ニューラルネットワーク処理専用のGPU、ASIC(application specific integrated circuit)、あるいはFPGAを用意する必要があった。一方、この実施の形態によれば、空気調和機、サーボシステム、車載センサ等の情報を、既存のCPUマイコン、メモリを含む汎用のハードウェアで処理することが可能となる。
情報処理装置50は、制御部57から、少なくとも送風部52の風向きと風量、温度を制御する。
そして、情報処理装置50は、赤外線センサ51の出力信号を入力信号とし、入力信号から生体の位置および温度変化を学習する学習プロセスと、該学習プロセスで得た情報を元に入力信号に基づいて推論を行う推論プロセスとを備え、該推論プロセスに基づいて制御動作を行う。
情報処理装置50では、NNが赤外線センサ51からの入力信号から生体の位置および生体の温度変化を推論するプロセスを有する。生体の位置は生体の温度を情報処理装置50を用いて検知することで行う。また、生体の温度変化から各生体にあった温度を予測する処理を情報処理装置50によって行う。
また、各生体にとっての適切な温度風量に関して、例えばコントローラの位置を把握できる機構(例えば、コントローラの先に赤外線を吸収あるいは反射する部材)を設け、各生体が空気調和機のコントローラを操作した際の情報を正解ラベルとして用いる。その際、NNにより、例えば、異なる生体を動き、体温、輪郭等の特徴量を抽出して認識する処理を行う。
学習したNNの重み行列Wは、記憶部55に保存し、空気調和機の次に起動時に、記憶部55から読み込み使用する。
また、NNの学習により、各生体の識別も実施できるため、その情報を組み合わせて様々な機能を有する空気調和機を構成することができる。また、複数の生体が同時に居る場合にも、各送風部52が風向きを調整する羽を固定して、各生体にとって適切な温度、風量を持った風を連続して送り続けることができる。
情報処理装置50は、制御部57から、動作部52aの動作停止、異常物排除の少なくとも一方を制御する。
そして、情報処理装置50は、センサ51aの出力信号を入力信号とし、入力信号から物体の位置変化量、電界、磁界、温度の内、少なくとも1つを学習する学習プロセスと、該学習プロセスで得た情報を元に入力信号に基づいて推論を行う推論プロセスとを備え、該推論プロセスに基づいて制御動作を行う。
なお、この場合も、事前にNNを学習させても良く、情報処理装置50は、学習プロセスと推論プロセスとの内、少なくとも推論プロセスを備え、推論時に記憶部55にある重み行列Wを呼び出して用いる。
サーボシステムは高速で動作するため、瞬時の判断が求められ、従来から、CNNが用いられることがあったが、高速化のために探索範囲を小さくして画像を小型化していた。 この実施の形態による電子機器をサーボシステムに用いる事により、大きな画像で同等の処理速度を維持できるため、より少ないセンサで広範囲の情報を処理することができ、より層数の多いNNを用いて精度向上を図ることができる。
サーボシステムは複数のサーボモータを連携した同時刻の制御が要求されるためリアルタイム性が重視される。また、動作異常時には迅速に状況を判断し、必要に応じて停止、再開する必要がある。この実施の形態によるサーボシステムでは、迅速な応答により、異常時の製造および動作を迅速に回避できる。このため、廃棄物となる無駄の削減ができ、また、装置が異常な体勢で動作することによる装置間の衝突等に起因する故障の低減ができる。
センサ51aは、自立型ロボットに直接、あるいは周囲に取り付けられる。情報処理装置50は、センサ51aから入力される入力信号からセンサ51a自体が持つノイズ、あるいはセンサ51aの使用環境に依存するノイズが印加された文字、色、バーコード、または不良の有無を学習する学習プロセスと、学習プロセスで得た情報を元に入力信号に基づいて推論を行う推論プロセスとを備え、推論プロセスに基づいて制御動作を行う。
衝突防止装置は、本願による情報処理装置50と、センサ51aと、動作部52aとを備える。情報処理装置50は、データを入力する入力部53と、入力部53から入力されたデータを解析する解析部54と、解析した結果を記録する記憶部55と、解析した結果を判定する判定部56と、各部を制御する制御部57とを備える。
情報処理装置50は、センサ51aから入力される入力信号から車外の生体の位置、周囲の車の位置および進行方向、信号機情報、車線情報の少なくとも1つを学習する学習プロセスと、学習プロセスで得た情報を元に入力信号に基づいて推論を行う推論プロセスとを備え、推論プロセスに基づいて制御動作を行う。なお、この場合も少なくとも推論プロセスを備えれば良い。
センサ51aは、車内に搭載されて制御動作の為の情報となる車内環境情報を検出するもので、カメラ等のCCD、CMOSイメージセンサの少なくとも1つが用いられる。
情報処理装置50は、センサ51aから入力される入力信号からドライバの顔、体格の少なくとも顔を学習する学習プロセスと、学習プロセスで得た情報を元に入力信号に基づいて推論を行う推論プロセスとを備え、推論プロセスに基づいて制御動作を行う。
この実施の形態による車載機器では、WWWのような情報ネットワークを利用する必要が無く、それらとの接続環境に拘わらず、省エネルギかつ簡易で安価な構成にてドライバの識別および認証を迅速に行う事ができる。
この実施の形態では、監視カメラの映像をWWWのような情報ネットワークを介してアップロードする必要が無く、情報処理装置50で分散処理することによって、情報量を削減できる。例えば、監視カメラ1台あたり、従来では1MB/sの情報量を扱っていたが、この実施の形態では、1kB/sの情報量を処理すれば良く格段と情報量を削減できる。これによって、個人行動の監視が容易で信頼性良く可能になる。
従って、例示されていない無数の変形例が、本願に開示される技術の範囲内において想定される。例えば、少なくとも1つの構成要素を変形する場合、追加する場合または省略する場合、さらには、少なくとも1つの構成要素を抽出し、他の実施の形態の構成要素と組み合わせる場合が含まれるものとする。
Claims (25)
- 入力信号をニューラルネットワークで処理する情報処理装置において、
前記入力信号をフーリエ変換し、第1振幅信号と第1位相信号とを出力するフーリエ変換層と、
訓練によって行列内の値を更新する第1重み行列を前記第1振幅信号にかけて第2振幅信号を出力する振幅結合層と、
訓練によって行列内の値を更新する第2重み行列を前記第1位相信号にかけて第2位相信号を出力する位相結合層と、
空間周波数領域の活性化関数である複素活性化関数fを用いて、前記第2振幅信号および前記第2位相信号の内、少なくとも前記第2振幅信号を更新する複素活性化層と、
前記複素活性化層で更新された前記第2振幅信号と前記第2位相信号とを組み合わせて逆フーリエ変換する逆フーリエ変換層と、
を備えた情報処理装置。 - 前記フーリエ変換層と前記逆フーリエ変換層との間に、前記振幅結合層、前記位相結合層および前記複素活性化層をそれぞれ少なくとも1つ備え、前記フーリエ変換層と前記逆フーリエ変換層との間で、連続して空間周波数領域における信号処理を行うものである、
請求項1に記載の情報処理装置。 - 前記複素活性化層は、前記第2位相信号を構成する行列内の各点iにおける位相θ(i)に対する前記複素活性化関数fの応答によって、前記第2振幅信号を構成する行列内の前記点iと同位置の点における振幅r(i)の値を更新して、更新された前記第2振幅信号を出力すると共に、前記第2位相信号を更新せずに出力する、
請求項1または請求項2に記載の情報処理装置。 - 前記複素活性化層で用いる前記複素活性化関数fは、前記第2振幅信号および前記第2位相信号の内、少なくとも前記第2振幅信号である対象信号に対して、N、MをそれぞれN≧2、M≧1である整数とし、前記対象信号を構成する行列の軸方向成分である周波数成分が1/N、かつ各要素が1/Mである微小行列を生成して前記行列に加算することにより前記対象信号を更新するものである、
請求項1または請求項2に記載の情報処理装置。 - 前記複素活性化層で用いる前記複素活性化関数fは、基準となる原点での値の絶対値が最大となる関数をカーネルとして、前記第2振幅信号および前記第2位相信号の内、少なくとも前記第2振幅信号である対象信号に対して畳み込み演算を行うものである、
請求項1または請求項2に記載の情報処理装置。 - 前記複素活性化関数fは、前記第2位相信号における前記位相θ(i)に対し、実軸成分、虚軸成分のいずれか一方が正または0の場合と、負の場合とで異なる応答によって、前記第2振幅信号における前記振幅r(i)の値を更新する複素Relu関数である、
請求項3に記載の情報処理装置。 - 前記複素活性化関数fは、
前記実軸成分が正または0である、(-π/2)≦θ(i)<(π/2)、において、前記振幅r(i)の値を保持し、前記実軸成分が負である、-π≦θ(i)<(-π/2)、または、(π/2)≦θ(i)<π、において、前記振幅r(i)の値を、(r(i)・|sinθ(i)|)、あるいは、(r(i)・sinθ(i))の値に変更する、
請求項6に記載の情報処理装置。 - 前記複素活性化関数fは、
前記虚軸成分が正または0である、0≦θ(i)<π、において、前記振幅r(i)の値を保持し、前記虚軸成分が負である、-π≦θ(i)<0、において、前記振幅r(i)の値を、(r(i)・|cosθ(i)|)、あるいは、(r(i)・cosθ(i))の値に変更する、
請求項6に記載の情報処理装置。 - 前記複素活性化関数fは、前記第2位相信号における前記位相θ(i)の大きさに依らず同じ演算式を用いた一定の応答により、前記第2振幅信号における前記振幅r(i)の値を更新する複素ロジスティック関数である、
請求項3に記載の情報処理装置。 - 前記複素活性化関数fは、複数個の前記微小行列を用いて前記対象信号を更新するものである、
請求項4に記載の情報処理装置。 - 前記N、Mは、それぞれ2のべき乗であり、前記微小行列はシフト演算を行うことで演算する、
請求項4または請求項10に記載の情報処理装置。 - 前記複素活性化関数fは、前記カーネルとなる関数にsinc関数を用い、前記対象信号に対して畳み込み演算後に絶対値を算出する、
請求項5に記載の情報処理装置。 - 前記第2振幅信号および前記第2位相信号の内、複素活性化関数fによって更新された信号に対して、ローパスフィルタまたはバンドパスフィルタとなる複素プーリング層を、前記複素活性化層の直後に備える、
請求項1から請求項12のいずれか1項に記載の情報処理装置。 - 前記第1振幅信号の振幅を対数化する振幅対数化層を、前記フーリエ変換層の後段に設け、該対数化を解消する逆振幅対数化層を、前記逆フーリエ変換層の前段に設ける、
請求項1から請求項13のいずれか1項に記載の情報処理装置。 - 前記第1振幅信号および前記第1位相信号の軸に対して対数化を行う軸対数化層を、前記フーリエ変換層の後段に設ける、
請求項1から請求項14のいずれか1項に記載の情報処理装置。 - 前記入力信号を前記ニューラルネットワークに入力する入力層と、前記逆フーリエ変換層の後段に配され、入力される信号を所望の形に変換して前記ニューラルネットワークから出力する出力層とを備える、
請求項1から請求項15のいずれか1項に記載の情報処理装置。 - 前記フーリエ変換層を、前記入力層の前段に、前記ニューラルネットワークの前処理のために配し、前記入力信号が前記フーリエ変換層にてフーリエ変換された後に前記入力層に入力される、
請求項16に記載の情報処理装置。 - 前記入力層と該入力層の後段に配される前記フーリエ変換層との間、および前記逆フーリエ変換層と該逆フーリエ変換層の後段に配される前記出力層との間、の少なくとも一方に、少なくとも1層の畳み込み層を備える、
請求項16に記載の情報処理装置。 - 請求項1から請求項18のいずれか1項に記載の情報処理装置を備えて制御動作を行う電子機器において、
前記電子機器は前記制御動作の為の情報を検出するセンサを備え、
前記情報処理装置は、前記センサの出力信号を前記入力信号とし、該入力信号に基づいて学習を行う学習プロセスと、該学習プロセスで得た情報を元に前記入力信号に基づいて推論を行う推論プロセスとの内、少なくとも前記推論プロセスを備え、該推論プロセスに基づいて前記制御動作を行う、
情報処理装置を備えた電子機器。 - 前記電子機器は、前記センサとして赤外線センサを備えて、風向、風量および温度を制御可能な空気調和機であり、
前記学習プロセスは、生体の位置および温度変化を学習するものである、
請求項19に記載の情報処理装置を備えた電子機器。 - 前記電子機器は、前記センサとして、CCD、CMOSイメージセンサ、近傍界アンテナ、遠方界アンテナの少なくとも1つを備え、物品の位置の監視、および該物品に記載された文字、色、バーコードの少なくとも1つを識別可能な自立型ロボットであり、
前記学習プロセスは、前記センサ自体が持つノイズ、あるいは前記センサの使用環境に依存するノイズが印加された文字、色、バーコード、または不良の有無を学習するものである、
請求項19に記載の情報処理装置を備えた電子機器。 - 前記電子機器は、ステアリング、アクセルおよびブレーキを制御可能な車載機器であって、前記センサとして、CCD、CMOSイメージセンサ、レーダ、超音波ソナーの少なくとも1つを備えて、前記制御動作の為の情報となる車外環境情報を検出し、
前記学習プロセスは、車外の生体の位置、周囲の車の位置および進行方向、信号機情報、車線情報の少なくとも1つを学習するものである、
請求項19に記載の情報処理装置を備えた電子機器。 - 前記電子機器は、ドライバを識別および認証可能な車載機器であって、前記センサとして、CCD、CMOSイメージセンサの少なくとも1つを備えて、前記制御動作の為の情報となる車内環境情報を検出し、
前記情報処理装置は、前記学習プロセスおよび前記推論プロセスを備え、
前記学習プロセスは、前記ドライバの顔、体格の少なくとも顔を学習して、得られた情報を定期的に更新するものである、
請求項19に記載の情報処理装置を備えた電子機器。 - 前記電子機器は、前記センサにより電磁波を検出して、動作停止、異常物排除の少なくとも一方を制御可能であり、
前記学習プロセスは、物体の位置変化量、電界、磁界、温度の内、少なくとも1つを学習するものである、
請求項19に記載の情報処理装置を備えた電子機器。 - 請求項1から請求項18のいずれか1項に記載の情報処理装置を備えて制御動作を行う電子機器において、
前記情報処理装置は、演算により生成された信号を前記入力信号とし、該入力信号に基づいて学習を行う学習プロセスと、該学習プロセスで得た情報を元に前記入力信号に基づいて推論を行う推論プロセスとの内、少なくとも前記推論プロセスを備え、該推論プロセスに基づいて前記制御動作を行う、
情報処理装置を備えた電子機器。
Priority Applications (5)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP19944683.2A EP4030346A4 (en) | 2019-09-13 | 2019-09-13 | Information processing device and electronic apparatus equipped with same |
| US17/633,968 US20220335276A1 (en) | 2019-09-13 | 2019-09-13 | Information processing device and electronic apparatus equipped with same |
| CN201980100089.XA CN114341878B (zh) | 2019-09-13 | 2019-09-13 | 信息处理装置以及具备该信息处理装置的电子设备 |
| PCT/JP2019/036101 WO2021049005A1 (ja) | 2019-09-13 | 2019-09-13 | 情報処理装置およびそれを備えた電子機器 |
| JP2020509541A JP6742554B1 (ja) | 2019-09-13 | 2019-09-13 | 情報処理装置およびそれを備えた電子機器 |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2019/036101 WO2021049005A1 (ja) | 2019-09-13 | 2019-09-13 | 情報処理装置およびそれを備えた電子機器 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2021049005A1 true WO2021049005A1 (ja) | 2021-03-18 |
Family
ID=72048001
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2019/036101 Ceased WO2021049005A1 (ja) | 2019-09-13 | 2019-09-13 | 情報処理装置およびそれを備えた電子機器 |
Country Status (5)
| Country | Link |
|---|---|
| US (1) | US20220335276A1 (ja) |
| EP (1) | EP4030346A4 (ja) |
| JP (1) | JP6742554B1 (ja) |
| CN (1) | CN114341878B (ja) |
| WO (1) | WO2021049005A1 (ja) |
Cited By (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113008559A (zh) * | 2021-02-23 | 2021-06-22 | 西安交通大学 | 基于稀疏自编码器和Softmax的轴承故障诊断方法及系统 |
| CN113203566A (zh) * | 2021-04-06 | 2021-08-03 | 上海吞山智能科技有限公司 | 一种基于一维数据增强和cnn的电机轴承故障诊断方法 |
| JP2022142602A (ja) * | 2021-03-16 | 2022-09-30 | 独立行政法人国立高等専門学校機構 | 電磁波レーダ装置および電磁波レーダ装置の学習方法 |
| WO2022250267A1 (ko) * | 2021-05-27 | 2022-12-01 | 울산과학기술원 | 왜곡된 영상으로부터 대상을 복원하는 방법 및 장치 |
| JP2022189811A (ja) * | 2021-06-11 | 2022-12-22 | ローベルト ボツシユ ゲゼルシヤフト ミツト ベシユレンクテル ハフツング | 機械学習アルゴリズム内において使用される機械学習分類器をチューニングするための超音波システム及び方法 |
| JPWO2024018592A1 (ja) * | 2022-07-21 | 2024-01-25 | ||
| JP2024541101A (ja) * | 2021-11-29 | 2024-11-06 | レイセオン カンパニー | 光学ネットワークを使用して畳み込みを実行する方法及びシステム |
| CN119273968A (zh) * | 2024-09-19 | 2025-01-07 | 中国人民解放军海军航空大学 | 基于曼巴傅里叶计算的多模态船只图像分类方法 |
Families Citing this family (15)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20230141209A1 (en) * | 2020-01-22 | 2023-05-11 | Totalenergies Onetech | Method and system for detecting oil slicks in radar images |
| US20210319289A1 (en) * | 2020-04-13 | 2021-10-14 | Alibaba Group Holding Limited | Frequency domain neural network accelerator |
| JP7528637B2 (ja) * | 2020-08-26 | 2024-08-06 | 株式会社Jvcケンウッド | 機械学習装置及び遠赤外線撮像装置 |
| EP4224410A4 (en) * | 2020-09-30 | 2023-11-29 | NEC Corporation | INFORMATION PROCESSING APPARATUS, LIVING BODY DETECTION SYSTEM, LIVING BODY DETECTION METHOD AND RECORDING MEDIUM |
| WO2022259520A1 (ja) * | 2021-06-11 | 2022-12-15 | 株式会社Subaru | 画像処理装置および車両 |
| US12505660B2 (en) * | 2021-12-29 | 2025-12-23 | Samsung Electronics Co., Ltd. | Image processing method and apparatus using convolutional neural network |
| US20230267363A1 (en) * | 2022-02-07 | 2023-08-24 | Lemon Inc. | Machine learning with periodic data |
| CN115205890B (zh) * | 2022-05-13 | 2025-11-25 | 南京博雅集智智能技术有限公司 | 一种非机动车行人重识别方法和系统 |
| CN115355166A (zh) * | 2022-08-30 | 2022-11-18 | 杭州展德软件技术有限公司 | 一种基于短时傅里叶变换的空压机故障诊断方法和系统 |
| US20240104339A1 (en) * | 2022-09-21 | 2024-03-28 | Robert Bosch Gmbh | Method and system for automatic improvement of corruption robustness |
| CN115712819B (zh) * | 2022-11-18 | 2025-09-26 | 吉林大学 | 一种基于生成对抗网络的地面核磁共振信号噪声压制方法 |
| CN116716079B (zh) * | 2023-06-14 | 2024-01-19 | 山东沃赛新材料科技有限公司 | 高性能防霉型醇型美容收边胶及其制备方法 |
| CN117825601B (zh) * | 2024-03-05 | 2024-05-24 | 山东润达检测技术有限公司 | 一种食品中二氧化硫的测定方法 |
| CN118628887B (zh) * | 2024-06-28 | 2025-05-13 | 三峡大学 | 基于时间增强型多维特征可视化的非侵入式负荷监测方法 |
| CN120447062B (zh) * | 2025-07-11 | 2025-09-12 | 中国海洋大学 | 一种基于轴频电磁场的多船舶追踪方法 |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2017049996A (ja) | 2015-09-02 | 2017-03-09 | 富士通株式会社 | 画像認識に用いられるニューラルネットワークの訓練方法及び訓練装置 |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP6173649B1 (ja) * | 2016-11-22 | 2017-08-02 | 三菱電機株式会社 | 劣化個所推定装置、劣化個所推定システム及び劣化個所推定方法 |
| CN109774740A (zh) * | 2019-02-03 | 2019-05-21 | 湖南工业大学 | 一种基于深度学习的轮对踏面损伤故障诊断方法 |
-
2019
- 2019-09-13 US US17/633,968 patent/US20220335276A1/en active Pending
- 2019-09-13 WO PCT/JP2019/036101 patent/WO2021049005A1/ja not_active Ceased
- 2019-09-13 JP JP2020509541A patent/JP6742554B1/ja active Active
- 2019-09-13 EP EP19944683.2A patent/EP4030346A4/en not_active Withdrawn
- 2019-09-13 CN CN201980100089.XA patent/CN114341878B/zh active Active
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2017049996A (ja) | 2015-09-02 | 2017-03-09 | 富士通株式会社 | 画像認識に用いられるニューラルネットワークの訓練方法及び訓練装置 |
Non-Patent Citations (2)
| Title |
|---|
| KO, JONG HWAN ET AL.: "Design of an Energy-Efficient Accelerator for Training of Convolutional Neural Networks using Frequency-Domain Computation", PROCEEDINGS OF THE 54TH ANNUAL DESIGN AUTOMATION CONFERENCE 2017, 22 June 2017 (2017-06-22), XP058367854, Retrieved from the Internet <URL:https://dl.acm.org/citation.cfm?id=3062228> [retrieved on 20191115] * |
| See also references of EP4030346A4 |
Cited By (14)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113008559A (zh) * | 2021-02-23 | 2021-06-22 | 西安交通大学 | 基于稀疏自编码器和Softmax的轴承故障诊断方法及系统 |
| CN113008559B (zh) * | 2021-02-23 | 2022-02-22 | 西安交通大学 | 基于稀疏自编码器和Softmax的轴承故障诊断方法及系统 |
| JP2022142602A (ja) * | 2021-03-16 | 2022-09-30 | 独立行政法人国立高等専門学校機構 | 電磁波レーダ装置および電磁波レーダ装置の学習方法 |
| CN113203566A (zh) * | 2021-04-06 | 2021-08-03 | 上海吞山智能科技有限公司 | 一种基于一维数据增强和cnn的电机轴承故障诊断方法 |
| KR102476808B1 (ko) * | 2021-05-27 | 2022-12-12 | 울산과학기술원 | 왜곡된 영상으로부터 대상을 복원하는 방법 및 장치 |
| KR20220160406A (ko) * | 2021-05-27 | 2022-12-06 | 울산과학기술원 | 왜곡된 영상으로부터 대상을 복원하는 방법 및 장치 |
| WO2022250267A1 (ko) * | 2021-05-27 | 2022-12-01 | 울산과학기술원 | 왜곡된 영상으로부터 대상을 복원하는 방법 및 장치 |
| JP2022189811A (ja) * | 2021-06-11 | 2022-12-22 | ローベルト ボツシユ ゲゼルシヤフト ミツト ベシユレンクテル ハフツング | 機械学習アルゴリズム内において使用される機械学習分類器をチューニングするための超音波システム及び方法 |
| JP2024541101A (ja) * | 2021-11-29 | 2024-11-06 | レイセオン カンパニー | 光学ネットワークを使用して畳み込みを実行する方法及びシステム |
| JP7782817B2 (ja) | 2021-11-29 | 2025-12-09 | レイセオン カンパニー | 光学ネットワークを使用して畳み込みを実行する方法及びシステム |
| JPWO2024018592A1 (ja) * | 2022-07-21 | 2024-01-25 | ||
| WO2024018592A1 (ja) * | 2022-07-21 | 2024-01-25 | 日本電信電話株式会社 | モデル学習装置、モデル学習方法、およびプログラム |
| JP7798196B2 (ja) | 2022-07-21 | 2026-01-14 | Ntt株式会社 | モデル学習装置、モデル学習方法、およびプログラム |
| CN119273968A (zh) * | 2024-09-19 | 2025-01-07 | 中国人民解放军海军航空大学 | 基于曼巴傅里叶计算的多模态船只图像分类方法 |
Also Published As
| Publication number | Publication date |
|---|---|
| US20220335276A1 (en) | 2022-10-20 |
| JP6742554B1 (ja) | 2020-08-19 |
| JPWO2021049005A1 (ja) | 2021-09-30 |
| EP4030346A1 (en) | 2022-07-20 |
| CN114341878A (zh) | 2022-04-12 |
| CN114341878B (zh) | 2025-02-14 |
| EP4030346A4 (en) | 2022-10-19 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP6742554B1 (ja) | 情報処理装置およびそれを備えた電子機器 | |
| Feng et al. | A review and comparative study on probabilistic object detection in autonomous driving | |
| CN111860155B (zh) | 一种车道线的检测方法及相关设备 | |
| CN110378381B (zh) | 物体检测方法、装置和计算机存储介质 | |
| Khanapuri et al. | Learning based longitudinal vehicle platooning threat detection, identification and mitigation | |
| CN114120634B (zh) | 基于WiFi的危险驾驶行为识别方法、装置、设备及存储介质 | |
| US11062141B2 (en) | Methods and apparatuses for future trajectory forecast | |
| US20210166085A1 (en) | Object Classification Method, Object Classification Circuit, Motor Vehicle | |
| CN114830131B (zh) | 等面多面体球面量规卷积神经网络 | |
| CN112036381B (zh) | 视觉跟踪方法、视频监控方法及终端设备 | |
| CN116227620A (zh) | 用于确定相似场景的方法、训练方法和训练控制器 | |
| Li et al. | Driver fatigue detection based on improved YOLOv7 | |
| WO2024093321A1 (zh) | 车辆的位置获取方法、模型的训练方法以及相关设备 | |
| Qian et al. | Support Vector Machine for Behavior‐Based Driver Identification System | |
| Pleterski et al. | Miniature mobile robot detection using an ultralow-resolution time-of-flight sensor | |
| Liang et al. | Car detection and classification using cascade model | |
| CN114445456A (zh) | 基于部分模型的数据驱动智能机动目标跟踪方法及装置 | |
| CN115115016A (zh) | 一种训练神经网络的方法与装置 | |
| He et al. | Driving behaviour characterisation by using phase‐space reconstruction and pre‐trained convolutional neural network | |
| CN118597194B (zh) | 车辆的风险评估方法、装置、计算机设备、存储介质和程序产品 | |
| CN114595738A (zh) | 为识别模型生成训练数据的方法和生成识别模型的方法 | |
| KR20250017685A (ko) | 센서 데이터의 분류를 위한 기계 학습 모델의 훈련 방법 | |
| Yang et al. | DDMI: A model information evaluation method based on deep dream | |
| CN114464216B (zh) | 无人驾驶行车环境下的声学检测方法和装置 | |
| Kapadnis et al. | Implementation of Autonomous Vehicle using Real-Time Image processing and Computer Vision Algorithm |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| ENP | Entry into the national phase |
Ref document number: 2020509541 Country of ref document: JP Kind code of ref document: A |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19944683 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2019944683 Country of ref document: EP Effective date: 20220413 |
|
| WWG | Wipo information: grant in national office |
Ref document number: 201980100089.X Country of ref document: CN |
|
| WWW | Wipo information: withdrawn in national office |
Ref document number: 2019944683 Country of ref document: EP |



























