WO2018212584A2 - Procédé et appareil de classification de catégorie à laquelle une phrase appartient à l'aide d'un réseau neuronal profond - Google Patents
Procédé et appareil de classification de catégorie à laquelle une phrase appartient à l'aide d'un réseau neuronal profond Download PDFInfo
- Publication number
- WO2018212584A2 WO2018212584A2 PCT/KR2018/005598 KR2018005598W WO2018212584A2 WO 2018212584 A2 WO2018212584 A2 WO 2018212584A2 KR 2018005598 W KR2018005598 W KR 2018005598W WO 2018212584 A2 WO2018212584 A2 WO 2018212584A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sentence
- class
- neural network
- feature vector
- vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
Definitions
- the present disclosure relates to a method and apparatus for classifying a class to which a sentence belongs, by structurally analyzing a question sentence using a deep neural network.
- AI Artificial Intelligence
- AI technology is composed of machine learning and elementary technologies that utilize machine learning.
- Machine learning is an algorithm technology that classifies / learns characteristics of input data by itself.
- Element technology is a technology that simulates the functions of human brain cognition and judgment by using machine learning algorithms such as neural networks. It consists of technical areas such as understanding, reasoning / prediction, knowledge representation, and motion control.
- AI technology can recognize, apply and process human language / characters, and is also used for natural language processing, machine translation, dialogue system, question and answer, speech recognition / synthesis, and so on.
- question-and-answer system using artificial intelligence technology, the structure of the user's question sentence is analyzed, and the answer type, intent, subject / verb analysis is performed, and the related answer is found in the database.
- question-and-answer system that executes a user's command, the user's input speech is classified, the intent is analyzed, and an independent entity is found to process the command.
- customer care chatbots are being utilized that use artificial intelligence to analyze user problems and provide appropriate answers.
- customer support chatbots it is important to analyze the user's speech and analyze the category in which the user wants to receive an answer. If the amount of questions already stored is not large, the user's speech may be misclassified into a different category, unlike the user's intention. In this case, the user may not receive the desired answer.
- the present disclosure provides a method and apparatus for increasing the classification accuracy of a first sentence by additionally using a separate second neural network model in classifying a class to which the first sentence belongs using the first neural network model. .
- the present disclosure not only learns using a first neural network model in classifying a first class to which a first sentence belongs, but also uses a first neural network model to further learn a first feature.
- It provides a method and apparatus that can distinguish the degree of similarity. Accordingly, the method and apparatus according to the embodiment of the present disclosure may improve the accuracy of sentence classification by using not only the label of a sentence or speech but also semantic similarity.
- FIG. 1 is a conceptual diagram illustrating an embodiment of obtaining a classification prediction value of a class to which a sentence belongs by training by inputting a sentence vector and a class into a neural network model according to an embodiment of the present disclosure.
- FIG. 2 is a block diagram illustrating components of an electronic device according to an embodiment of the present disclosure.
- FIG. 3 is a flowchart illustrating a method of classifying a class to which a sentence belongs, according to an embodiment of the present disclosure.
- FIG. 4 is a diagram for describing a method of classifying a class to which a sentence belongs, using a convolutional neural network, according to an embodiment of the present disclosure.
- FIG. 5 is a flowchart illustrating a method of obtaining, by an electronic device, a classification prediction value that is a probability value classified into a class to which a first sentence belongs.
- FIG. 6 is a flowchart illustrating a learning method of adjusting, by an electronic device, a weight applied to a neural network model based on a loss value obtained through the neural network model, according to an embodiment of the present disclosure.
- an embodiment of the present disclosure provides a first neural network using a first sentence including at least one word and a first class to which the first sentence belongs, as input data. Training a first feature vector through; Learning a second feature vector through a second neural network using as input data a second sentence and a second class to which the second sentence belongs; Contrastive loss that quantifies the degree of similarity in expression of the first sentence and the second sentence based on whether the first feature vector, the second feature vector, and the first class and the second class are identical. Obtaining; And repeating the learning using the first neural network and the second neural network so that the contrast loss value is maximum, using a deep neural network. Provides a method of classification.
- the method may include receiving a speech from a user; Recognizing the received speech as a sentence; And extracting at least one word included in the recognized sentence, and converting the at least one word into at least one word vector, wherein learning the first feature vector comprises: Generating a sentence vector by arranging the word vectors in a matrix form; And learning the first feature vector by inputting the sentence vector as input data to the first neural network.
- a plurality of sentences and a plurality of classes to which each of the plurality of sentences belong are stored in a database, and the second sentence and the second class may be extracted randomly on the database.
- the obtaining of the contrast loss value may be performed using a formula representing a dot product of a first feature vector and a second feature vector and whether the first class and the second class are equal to each other by a numerical formula. Can be calculated.
- the formula may output 1 when the first class and the second class are the same, and output 0 when the first class and the second class are not the same.
- learning the first feature vector may include converting the first sentence into a matrix form including at least one word vector; Inputting the transformed matrix into the convolutional neural network as input data and generating a feature map by applying a plurality of filters; And extracting the first feature vector by passing the feature map through a max pooling layer.
- the method includes inputting a first feature vector into a fully connected layer and converting it into a one-dimensional vector value; And inputting a one-dimensional vector value to a softmax classifier to obtain a first classification prediction value representing a probability distribution classified into a first class.
- the method includes obtaining a first classification loss value that is a difference between the first classification prediction value and the first class; Acquire a second classification prediction value representing a probability distribution in which a second sentence is classified into a second class through the second neural network, and obtain a second classification loss value, which is a difference between the second classification prediction value and the second class. Doing; And calculating the final loss value by summing the first classification loss value, the second classification loss value, and the control loss value, and calculating the final loss value to the first neural network and the second neural network based on the calculated final loss value.
- the method may further include adjusting a weight applied.
- the learning through the first neural network and the learning through the second neural network may be performed at the same time.
- an embodiment of the present disclosure may provide an electronic device that classifies a class to which a sentence belongs, using a deep neural network.
- the electronic device includes a processor that performs training by using a neural network, and the processor inputs a first sentence including at least one word and a first class to which the first sentence belongs.
- a first feature vector is learned through a first neural network as data
- a second feature vector is obtained through a second neural network using as input data a second sentence and a second class to which the second sentence belongs.
- Contrast loss value obtained by learning and quantifying the degree of similarity in expression of the first sentence and the second sentence based on whether the first feature vector, the second feature vector, and the first class and the second class are identical. (contrastive loss) can be obtained, and the learning using the first neural network and the second neural network can be repeated to maximize the contrast loss value.
- the electronic device may further include a speech input unit configured to receive a speech from a user, and the processor may recognize the received speech as a sentence, extract at least one word included in the recognized sentence, and At least one word may be converted into at least one word vector, respectively.
- a speech input unit configured to receive a speech from a user
- the processor may recognize the received speech as a sentence, extract at least one word included in the recognized sentence, and At least one word may be converted into at least one word vector, respectively.
- the processor generates the sentence vector by arranging the at least one word vector in a matrix form, and inputs the sentence vector as input data to the first neural network to learn the first feature vector. can do.
- the electronic device further includes a database storing a plurality of sentences and a plurality of classes to which each of the plurality of sentences belongs, and wherein the processor is configured to store the second sentence and the second class on the database.
- the data may be extracted randomly and input to the second neural network as input data.
- the processor may calculate the contrast loss value through a numerical expression representing a dot product of the first feature vector and the second feature vector and whether the first class and the second class are equal. have.
- the formula may output 1 when the first class and the second class are the same, and output 0 when the first class and the second class are not the same.
- the processor converts the first sentence into a matrix including at least one word vector, inputs the converted matrix into input data into a convolutional neural network, and applies a plurality of filters.
- a feature map may be generated, and the first feature vector may be extracted by passing the feature map through a max pooling layer.
- the processor may input the first feature vector into a fully connected layer and convert the first feature vector into a one-dimensional vector value, and input the one-dimensional vector value into a softmax classifier to generate the first feature vector.
- a first classification prediction value representing a probability distribution classified into a class may be obtained.
- the processor obtains a first classification loss value, which is a difference value between the first classification prediction value and the first class, and calculates a probability distribution in which a second sentence is classified into a second class through the second neural network.
- the final loss value may be calculated by summing, and the weight applied to the first neural network and the second neural network may be adjusted based on the calculated final loss value.
- the processor may simultaneously perform learning through the first neural network and learning through the second neural network.
- an embodiment of the present disclosure provides a computer program product including a computer-readable storage medium, wherein the storage medium includes a first sentence comprising at least one word and the first sentence.
- ... unit means a unit for processing at least one function or operation, which is implemented in hardware or software or a combination of hardware and software. Can be.
- FIG. 1 illustrates training by inputting a sentence vector (S i , S j ) and a class (y 1 , y 2 ) into a neural network model (100, 110) according to an embodiment of the present disclosure. It is a conceptual diagram for explaining an embodiment of obtaining classification prediction values y 1 ′ and y 2 ′ of a class.
- Artificial intelligence (AI) algorithms including deep neural networks, include input data into an artificial neural network (ANN), and learn output data through operations such as convolution. do.
- Artificial neural networks can refer to a computer scientific architecture that models the biological brain.
- nodes corresponding to neurons in the brain are connected to each other and operate collectively to process input data.
- neurons in the neural network have links with other neurons. Such connections may extend in one direction, for example in a forward direction, via a neural network.
- the first, the first sentence vector (S i) and a first class (y 1) to the neural network model 100 is input as the input data, the first learning by the neural network model 100 (
- the first classification predicted value y 1 ′ may be output through training.
- a second sentence vector S j and a second class y 2 are input to the second neural network model 110 as input data, and the second classification is performed through learning through the second neural network model 110.
- the predicted value y 2 ′ may be output.
- the first neural network model 100 and the second neural network model 110 shown in FIG. 1 may be implemented as a convolutional neural network (CNN), but are not limited thereto.
- the first neural network model 100 and the second neural network model 110 comprise a recurrent neural network (RNN), a deep belief network (DBN), a limited Boltzmann machine ( It can be implemented as an artificial neural network model such as a Restricted Boltzman Machine (RBM) method, or a machine learning model such as a support vector machine (SVM).
- RNN recurrent neural network
- DNN deep belief network
- RBM Restricted Boltzman Machine
- SVM support vector machine
- the first sentence vector S i and the second sentence vector S j are extracted by parsing at least one word included in a sentence or utterance input by a user through a natural language processing technique, and extracts the extracted word. Can be generated by converting to a vector.
- the first sentence vector (S i) and the second sentence vectors (S j) may be generated through a machine learning model for embedding (embedding) the word, such as word2vec, GloVe, onehot as vectors, whereby It is not limited.
- the first sentence vector Si and the second sentence vector S j may be generated by arranging at least one word vector in a matrix form.
- the first class (y 1) and a second class (y 2) is a vector may be a value that defines the class to which it belongs each of the first sentence vector (S i) and the second sentence vectors (S j).
- a class does not mean a hierarchy, but may mean a category classification to which a sentence belongs, for example, politics, society, economy, culture, entertainment, IT, and the like.
- First classification predicted value (y 1 ') are learned through the first neural network model 100, the first sentence vector (S i) of the first neural network model 100 as a result of the data output due to learning by It may mean a probability value that may be classified as the first class y 1 .
- the first first sentence corresponding to the sentence vector (S i) can be related to the global category "politics.”
- the second classification predicted value y 2 ′ is a result value output through the second neural network model 110, and the second sentence vector S j is trained through the second neural network model 110 to generate a second class. It may mean a probability value that may be classified as (y 2 ).
- a first classification loss value may be obtained by calculating a difference value between the first classification prediction value y 1 ′ and the first class y 1 .
- a second classification loss value may be obtained by calculating a difference value between the second classification prediction value y 2 ′ and the second class y 2 .
- the first neural network model 100 and the second neural network model 110 may be configured as a convolutional neural network (CNN).
- the first sentence vector Si and the second sentence vector S j are each configured to filter a plurality of filters having different widths through the first neural network model 100 and the second neural network model 110.
- the first and second feature vectors may be learned, respectively.
- Contrastive loss L 1 may be obtained based on the identity of.
- the contrast loss value L 1 may be calculated through a numerical expression representing the dot product of the first feature vector and the second feature vector and whether the first class and the second class are the same. .
- the control loss value L 1 will be described in detail in the description of FIG. 4.
- control loss value L 1 may have a value in the range of ⁇ 1 or more and 1 or less.
- learning with the first neural network model 100 and the second neural network model 110 may be repeated in a direction in which the control loss value L 1 is maximized.
- the repetition of learning may mean adjusting a weight applied to the first neural network model 100 and the second neural network model 110.
- a classification is made by creating a loss function by learning through a neural network model based on the label of the sentence to be learned. Perform.
- many misclassifications can occur if the utterance to be classified does not fall into any class of the classification model.
- classes are classified based on labels, many expressions may be misclassified even if the expressions are similar. For example, when the user input speech is "Send 'KakaoTalk' to 'XXX'", a case may be classified as "Send 'Text' to 'XXX'.”
- An embodiment of the present disclosure not only learns using the first neural network model 100 in classifying a first class to which a first sentence belongs, but also a second neural network model to learn a second sentence belonging to a second class.
- By calculating the contrast loss value (L 1 ) based on the identity of the two classes (y 2 ) there is provided a method and apparatus that can distinguish the degree of representational similarity between the first sentence and the second sentence.
- the method and apparatus according to an embodiment of the present disclosure may use not only the label of a sentence or speech but also semantic similarity together to improve classification accuracy for sentences that are similar but belong to different classes.
- the electronic device 200 may be a device that performs training for classifying a class to which a sentence belongs by using a neural network model.
- the electronic device 200 may be a fixed terminal implemented as a computer device or a mobile terminal.
- the electronic device 200 may be, for example, at least one of a smart phone, a mobile phone, a navigation device, a computer, a notebook computer, a digital broadcasting terminal, a personal digital assistant (PDA), a portable multimedia player (PMP), and a tablet PC. But it is not limited thereto.
- the electronic device 200 may communicate with other electronic devices and / or servers through a network by using a wireless or wired communication scheme.
- the electronic device 200 may include a processor 210, a memory 220, and a speech input unit 230.
- the processor 210 may be configured to process instructions of a computer program by performing arithmetic, logic, and input / output operations, such as convolution operations. Instructions may be provided to the processor 210 by the memory 220. In one embodiment, processor 210 may be configured to execute a command received according to a program code stored in a recording device, such as memory 220. The processor 210 may be configured, for example, with at least one of a central processing unit, a microprocessor, and a graphic processing unit, but is not limited thereto. In an embodiment, when the electronic device 200 is a mobile device such as a smart phone, a tablet PC, or the like, the processor 210 may be an application processor (AP) for executing an application.
- AP application processor
- the processor 210 may perform training through a general artificial intelligence algorithm based on a deep neural network such as a neural network model.
- the processor 210 may perform a natural language processing (NLP) such as extracting a word from a user's speech, a question sentence, and the like, converting the extracted word into a word vector to generate a sentence vector.
- NLP natural language processing
- the processor 210 parses a word object through objectization of a sentence, processes a still word (filters for articles, etc.) and generates a token (tense, plural unification, etc.), and then extracts highly related keywords based on the frequency of occurrence. You can manage this as an independent entity.
- NLP natural language processing
- the processor 210 learns a first feature vector through a first neural network using as input data a first sentence including at least one word and a first class to which the first sentence belongs.
- the second feature vector may be learned through the second neural network using the second class and the second class to which the second sentence belongs.
- the first sentence may be a sentence or speech input by a user
- the second sentence may be a sentence extracted randomly among a plurality of sentences stored in a server or a database.
- the processor 210 may quantify the degree of similarity in the expression of the first sentence and the second sentence based on the first feature vector, the second feature vector, and whether the first class and the second class are identical. ) Can be obtained.
- the processor 210 may calculate the contrast loss value through a numerical expression representing the dot product of the first feature vector and the second feature vector and whether the first class and the second class are equal. . The method of calculating the contrast loss value will be described in detail in the description with reference to FIG. 4.
- the contrast loss value has a value in the range of -1 or more and 1 or less, and the processor 210 repeats the learning using the first neural network and the second neural network so that the obtained contrast loss value is maximum. Can be done.
- the processor 210 may simultaneously perform learning through the first neural network and learning through the second neural network.
- the electronic device 200 may further include a speech input unit 230 that receives a speech or sentence from a user.
- the speech input unit 230 may include a voice recognition module for recognizing a user's voice, but is not limited thereto.
- the speech input unit 230 may include, for example, a hardware module capable of receiving a user's sentence such as a keypad, a mouse, a touch pad, a touch screen, a jog switch, and the like.
- the processor 210 recognizes a utterance input through the utterance input unit 230 as a sentence, parses and extracts at least one word included in the recognized sentence, and extracts at least one extracted word. Each can be converted to a word vector.
- the processor 210 may embed a word into a vector using a machine learning model such as word2vec, GloVe, onehot, etc., but is not limited thereto.
- the processor 210 may convert the word representation into a vector value that can be represented in a vector space using the machine learning model.
- the processor 210 may generate a sentence vector by arranging at least one word vector in a matrix form, and input the sentence vector as input data to the first neural network to learn the first feature vector.
- the processor 210 converts the first sentence into a matrix form comprising at least one word vector, inputs the transformed matrix into the convolutional neural network as input data, and inputs a plurality of filters.
- a feature map may be generated and the first feature vector may be extracted by passing the feature map through a max pooling layer.
- the processor 210 inputs the first feature vector into a fully connected layer and converts it into a one-dimensional vector value, and inputs the one-dimensional vector value to a softmax classifier.
- a first classification prediction value representing a probability distribution classified into one class may be obtained.
- the processor 210 learns and extracts a second feature vector, inputs the second feature vector into a fully connected layer, converts it into a one-dimensional vector value, and inputs the one-dimensional vector value to a soft max classifier to give the second class.
- a second classification prediction value representing a probability distribution classified as may be obtained. Detailed description thereof will be described later in the description of FIG. 4.
- the processor 210 may obtain a first classification loss value that is a difference between the first classification prediction value and the first class, and obtain a second classification loss value that is a difference between the second classification prediction value and the second class.
- the processor 210 calculates a final loss value by summing the first classification loss value, the second classification loss value, and the control loss value, and is applied to the first neural network and the second neural network based on the calculated final loss value. Learning to adjust the weight can be repeated.
- the memory 220 may be a computer-readable recording medium, and may include a permanent mass storage device such as random access memory (RAM), read only memory (ROM), and a disk drive.
- the memory 220 may store an operating system (OS) or at least one computer program code (for example, a code for a learning program through a neural network performed by the processor 210). .
- OS operating system
- Such computer program code may be stored in the memory 220 but may be loaded from a separate computer readable recording medium or a computer program product.
- the computer-readable recording medium may include a computer-readable recording medium such as a floppy drive, a disk, a tape, a DVD / CD-ROM drive, a memory card, and the like.
- the computer program code may be installed in the electronic device 200 and loaded from the memory 220 by files provided from a server through a network.
- the electronic device 200 may include a database.
- the database may store a plurality of sentences and a plurality of classes to which each of the plurality of sentences belongs.
- the database may be included as a component in the electronic device 200, but is not limited thereto.
- the database may be configured in the form of a server disposed outside the electronic device 200.
- the processor 210 may randomly extract the second sentence and the second class from the database and input the input data into the second neural network as input data to learn.
- FIG. 3 is a flowchart illustrating a method of classifying a class to which a sentence belongs, according to an embodiment of the present disclosure.
- the electronic device trains a first feature vector through a first neural network using the first sentence and the first class to which the first sentence belongs as input data.
- the electronic device may receive a utterance or question from a user and recognize the received utterance or question as a sentence.
- the electronic device may parse at least one word included in a recognized sentence by using natural language processing (NLP) technology, and may convert at least one word into at least one word vector.
- NLP natural language processing
- the electronic device may embed at least one word into at least one word vector using a machine learning model such as word2vec, GloVe, onehot, and the like, but is not limited thereto.
- the electronic device may convert the word representation into a vector value that can be represented in a vector space using the machine learning model.
- the electronic device generates a sentence vector by arranging the embedded at least one word vector in a matrix form, and inputs the generated sentence vector as input data into the first neural network to classify it as a first class. Learn probability distributions.
- the electronic device learns the second feature vector through the second neural network using the second sentence and the second class to which the second sentence belongs as input data.
- the second sentence and the second class to which the second sentence belongs may be stored in a database form.
- the electronic device may randomly extract the second sentence and the second class from the database and input the second sentence and the second class as input data into the second neural network to learn.
- step S320 is performed after step S310, but is not limited thereto.
- the electronic device may simultaneously perform the step of learning the first feature vector (S310) and the step of learning the second feature vector (S320).
- the electronic device obtains a contrast loss based on the first feature vector, the second feature vector, and whether the first class and the second class are identical.
- the contrast loss value may be calculated through a numerical expression representing the dot product of the first feature vector and the second feature vector and whether the first class and the second class are equal. The expression may output 1 when the first class and the second class are the same, and output 0 when the first class and the second class are not the same.
- the electronic device repeats the learning using the first neural network and the second neural network so that the contrast loss value is maximum.
- the control loss value may have a value ranging from -1 to 1, inclusive.
- the electronic device may repeat the learning using the first neural network model and the second neural network model in a direction in which the contrast loss value is maximized.
- repetition of learning may mean adjusting a weight applied to the first neural network model and the second neural network model.
- FIG. 4 is a diagram for describing a method of classifying a class to which a sentence belongs, using a convolutional neural network, according to an embodiment of the present disclosure.
- an electronic apparatus includes a first neural network (401), a first sentence (S i) of the first class (y 1), a first sentence (S i) to the input to the input data is the first class in the ( We can learn the probability distribution classified as y 1 ). Further, the electronic device is a second neural network (402), a second sentence (S j) and a second class (y 2) of this second class second sentence (S j) by entering the input data (y 2) in the Learn the probability distributions that are classified.
- the first class (y 1) and a second class (y 2) is a vector may be a value that defines the class to which it belongs each of the first sentence vector (S i) and the second sentence vectors (S j).
- the first neural network 401 and the second neural network 402 may be configured as a convolutional neural network model (CNN), but are not limited thereto.
- the first neural network 401 and the second neural network 402 are a recurrent neural network (RNN), a deep belief network (DBN), a restricted Boltzman machine (Restricted Boltzman). It may be implemented as an artificial neural network model such as a machine (RBM) scheme or a machine learning model such as a support vector machine (SVM).
- RNN recurrent neural network
- DNN deep belief network
- restricted Boltzman machine restricted Boltzman
- RBM machine
- SVM support vector machine
- Electronic device extracts a plurality of words (words 1-1 to 1-6 words) in the first sentence (S i) with a natural language processing technology, and the first sentence (S i) a plurality of words (words 1-1 To words 1-6), and a plurality of words (words 1-1 to 1-6) can be extracted.
- Figure 4 but in a total of six words shown in 1-1 to 1-6, which is exemplary and is not a word belonging to the first sentence (S i) is not limited to six.
- the electronic device is a plurality of words (words 1-1 to 1-6 words), a plurality of word vectors, respectively can be converted to (wv wv 1-1 to 1 6).
- the electronic device is the first sentence vector (S i) and the second sentence vectors (S j) is word2vec, GloVe, a plurality of words using a machine learning model, such as onehot (1-1 words to the words 1 -6 may be embedded into a plurality of word vectors wv 1 -1 to wv 1 -6 .
- the sentence vector 411 may be an n ⁇ k matrix having n words and a dimension of k.
- the electronic device may apply a plurality of filters 421 having different widths to the sentence vector 411 to perform a convolution operation, thereby generating a feature map 431.
- the plurality of filters 421 are vectors having different weights, and the weight value may change as learning progresses.
- the electronic device may generate the feature map 431 by multiplying and adding vector values of the sentence vector 411 and weight values of the plurality of filters 421.
- the plurality of filters 421 are illustrated as having a width of 2, 3, and 4, but are not limited thereto.
- the dimension k in the plurality of filters 421 may be the same as the dimension k of the sentence vector.
- the electronic device may subsample the feature map 431 by passing the feature map 431 through a max pooling layer, and generate a first feature vector 441.
- the first feature vector 441 is a single feature vector generated by extracting only the vector value having the maximum value from the feature map 431 through the max pooling layer, and is a representation of the first sentence Si. ) Can be defined as an expression vector.
- FIG. 4 shows that the electronic device may generate the first feature vector 441 through average pooling or L 2 -norm pooling.
- the electronic device may input and concatenate the first feature vector 441 to a fully connected layer, thereby generating the one-dimensional vector 451.
- the electronic device may generate the first classification predicted value vector 461 by inputting the one-dimensional vector 451 to the softmax classifier.
- a first classification vector prediction value 461 may represent a probability distribution, which can be classified as a first sentence (S i) a first class (y 1).
- the electronic device may perform a dropout operation in order to prevent the occurrence of overfitting generated in the process of adjusting the weight.
- the electronic device generates a second feature vector 442 by inputting the second sentence S j and the second class y 2 as input data to the second neural network 402, thereby generating a second classification predicted vector.
- the learning method through the second neural network 402 is the same as the learning method through the first neural network 401 except for input data and learning results, and thus redundant description thereof will be omitted.
- the electronic device includes a first feature vector 441, a second feature vector 442, and the first class (y 1) and the second class (y 2), a first sentence (S i) based on the identity if the and the Contrastive loss L 1 obtained by quantifying the degree of similarity in expression of two sentences S j may be obtained. If the first feature vector 441 is defined as F (S i ) and the second feature vector 442 is defined as F (S j ), the contrast loss value L 1 is calculated based on the following equation: Can be.
- the contrast loss value L 1 may be calculated through the absolute value of the dot product and Y of the first feature vector F (S i ) and the second feature vector F (S j ). have.
- Y is a notation for converting the identity of the first class (y 1 ) and the second class (y 2 ) to a number, and the first class (y 1 ) and the second class (y 2 ) 1 may be output when 0 is the same, and 0 may be output when first class y 1 and second class y 2 are not identical.
- -1 can be calculated.
- 0.
- 0.
- 1.
- the contrast loss value L 1 is not only identical to the classes y 1 and y 2 in which the first and second feature vectors are classified, but also the first feature vector F (S i. ) And the similarity degree of the second feature vector F (S j ) may be determined.
- the electronic device may learn in a direction in which the contrast loss value L 1 is maximized.
- the contrast loss value L 1 has a value of -1 or more and 1 or less.
- the electronic device may reduce the number of times of learning through the first neural network 401 and the second neural network 402. That is, when the first sentence Si and the second sentence S j have similar expressions even though they belong to different classes, the electronic device may increase the number of learning to distinguish each other.
- the electronic device may not have increased the number of times of learning with relative case with similar expression belonging to the same class the first sentence (S i) and the second sentence (S j).
- the electronic device may determine a first classification loss value, which is a difference value between a first classification prediction value vector 461 that is output data of learning through the first neural network 401 and a vector of the first class y 1 .
- classification loss (L 2 ) can be obtained.
- the electronic device determines a second classification loss value L 3 , which is a difference between a second classification prediction value vector 462, which is output data of learning through the second neural network 402, and a vector of the second class y 2 .
- the first classification loss value L 2 and the second classification loss value L 3 are each classified into the first class S i as the first class y 1
- the second sentence S j is the first classification loss value L 3 .
- the electronic device may add a final loss value (total) by adding a control loss value L 1 , a first classification loss value L 2 , and a second classification loss value L 3 as shown in Equation 2 below.
- loss) (L) can be calculated.
- the electronic device may learn to adjust weights applied to the first neural network 401 and the second neural network 402 based on the calculated final loss value L.
- a first sentence (S i) and the second sentence (S j) are each of the first class (y 1) and a second class (y 2) classification classified as loss
- y 1 the first class
- y 2 the second class
- L 1 the contrast loss value
- the electronic device executes an interactive robot program such as Bixby or the like, even if the first sentence S i , which is the speech input by the user, belongs to the first class y 1 , the sentence expression is different and thus is different. Can be classified as In this case the user is not the answer to your question in accordance with the first sentence of unwanted (S i) may be the answer incorrectly classified. In this case, the electronic device may increase the accuracy of classifying the class to which the user's question belongs by learning by considering the contrast loss value L 1 .
- the may be a case where one sentence (S i) does not belong to any class of previously stored class to the electronic device, in which case the electronic device can not be classified as a either a class of the first sentence (S i) (reject), which can reduce the likelihood of users receiving unwanted answers.
- FIG. 5 is a flowchart illustrating a method of obtaining, by an electronic device, a classification prediction value that is a probability value classified into a class to which a first sentence belongs.
- the electronic device converts the first sentence into a matrix form including at least one word vector.
- the first sentence may be a speech or sentence input by a user.
- the electronic device may extract at least one word included in the first sentence and convert the at least one word into at least one word vector.
- the electronic device may embed at least one word into at least one word vector using a machine learning model such as word2vec, GloVe, onehot.
- the electronic device may generate the first sentence vector by arranging at least one word vector in a matrix form.
- the electronic device inputs the converted matrix as input data to a convolutional neural network and generates a feature map by applying a plurality of filters.
- the electronic device may apply a convolution operation by applying multiple filters having different widths.
- the multiple filter is a vector having different weights, and the weight value may change as learning progresses.
- the multiple filter may have the same dimension as the dimension of the sentence vector generated in step S510.
- the electronic device extracts the first feature vector by passing the feature map through a max pooling layer.
- the electronic device may extract a first feature vector that is a single feature vector generated by extracting only a vector value having a maximum value from the feature map through the max pooling layer.
- the layer used for subsampling is not limited to the max pooling layer.
- the electronic device may extract the first feature vector through average pooling or L 2 -norm pooling.
- the electronic device inputs the first feature vector into a fully connected layer and converts the first feature vector into a one-dimensional vector value.
- the electronic device may concatenate a first feature vector composed of a plurality of feature maps generated by using filters having different widths into one to convert a one-dimensional vector value.
- a dropout operation may be used to solve overfitting occurring while learning the first feature vector and to increase the accuracy of the training data.
- the electronic device obtains a first classification prediction value by inputting a one-dimensional vector value to a softmax classifier.
- the first classification prediction value refers to a probability value in which the first sentence may be classified into the first class, and may be generated by passing through a soft max classifier.
- the vector value included in the one-dimensional vector may be converted into a probability value through which the total sum of the vector values passes through the soft max classifier.
- FIG. 5 illustrates a process of obtaining a first classification prediction value by inputting a first sentence to a convolutional neural network
- the illustrated steps may be equally applied to the second sentence.
- the electronic device may obtain a second classification prediction value by inputting the second sentence to the convolutional neural network according to steps S510 to S550.
- the electronic device may simultaneously perform a first learning process of obtaining a first classification prediction value and a second learning process of obtaining a second classification prediction value.
- FIG. 6 is a flowchart illustrating a learning method of adjusting, by an electronic device, a weight applied to a neural network model based on a loss value obtained through the neural network model, according to an embodiment of the present disclosure.
- the electronic device obtains a first classification loss value that is a difference between the first classification prediction value and the first class.
- the first classification loss value may mean a difference value between the first classification prediction value and the first class vector, which is a probability value in which the first sentence may be classified into the first class.
- the electronic device obtains a second classification loss value that is a difference between the second classification prediction value and the second class.
- the second classification loss value may mean a difference value between the second classification prediction value and the second class vector, which is a probability value in which the second sentence may be classified into the second class.
- step S610 and step S620 may be performed simultaneously.
- the electronic device obtains a final loss value by summing the first classification loss value, the second classification loss value, and the control loss value.
- a detailed method of calculating the control loss value has been described in the description of FIG. 4, and thus redundant description will be omitted.
- the electronic device adjusts a weight applied to the first neural network and the second neural network based on the final loss value.
- the first neural network and the second neural network are composed of a convolutional neural network that generates a feature map by applying a plurality of filters, and the electronic device is applied to the convolutional neural network according to the magnitude of the final loss value.
- the weight values of the plurality of filters may be adjusted.
- the electronic device described herein may be implemented as a hardware component, a software component, and / or a combination of hardware components and software components.
- the electronic device described in the disclosed embodiments may include a processor, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA), a programmable logic unit (PLU), It may be implemented using one or more general purpose or special purpose computers, such as a microprocessor or any other device capable of executing and responding to instructions.
- the software may include a computer program, code, instructions, or a combination of one or more of the above, and configure the processing device to operate as desired, or process it independently or collectively. You can command the device.
- the software may be implemented as a computer program including instructions stored in a computer-readable storage media.
- Computer-readable recording media include, for example, magnetic storage media (eg, read-only memory (ROM), random-access memory (RAM), floppy disks, hard disks, etc.) and optical read media (eg, CD-ROMs). (CD-ROM) and DVD (Digital Versatile Disc).
- the computer readable recording medium can be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.
- the medium may be read by a computer, stored in a memory, and executed by a processor.
- the computer is a device capable of calling stored instructions from a storage medium and operating according to the disclosed embodiments according to the called instructions, and may include an electronic device according to the disclosed embodiments.
- the computer readable storage medium may be provided in the form of a non-transitory storage medium.
- 'non-temporary' means that the storage medium does not include a signal and is tangible, but does not distinguish that the data is stored semi-permanently or temporarily on the storage medium.
- an electronic device or method according to the disclosed embodiments may be provided included in a computer program product.
- the computer program product may be traded between the seller and the buyer as a product.
- the computer program product may include a software program, a computer-readable storage medium on which the software program is stored.
- a computer program product may be a product (eg, a downloadable application) in the form of a software program distributed electronically through a manufacturer of an electronic device or an electronic market (eg, Google Play Store, App Store). ) May be included.
- the storage medium may be a server of a manufacturer, a server of an electronic market, or a storage medium of a relay server that temporarily stores a software program.
- the computer program product may include a storage medium of a server or a storage medium of a terminal in a system consisting of a server and a terminal (for example, an ultrasound diagnostic apparatus).
- a third device eg, a smartphone
- the computer program product may include a storage medium of the third device.
- the computer program product may include a software program itself transmitted from the server to the terminal or the third device, or transmitted from the third device to the terminal.
- one of the server, the terminal and the third device may execute the computer program product to perform the method according to the disclosed embodiments.
- two or more of the server, the terminal and the third device may execute a computer program product to distribute and implement the method according to the disclosed embodiments.
- a server eg, a cloud server or an artificial intelligence server, etc.
- a server may execute a computer program product stored in the server to control a terminal connected to the server to perform the method according to the disclosed embodiments.
- a third device may execute a computer program product to control a terminal in communication with the third device to perform the method according to the disclosed embodiment.
- the third device may download the computer program product from the server and execute the downloaded computer program product.
- the third apparatus may execute the provided computer program product in a preloaded state to perform the method according to the disclosed embodiments.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
L'invention concerne un procédé et un appareil de classification de catégorie à laquelle une phrase appartient à l'aide d'un réseau neuronal profond. Un mode de réalisation de la présente invention concerne un procédé et un appareil pour : l'apprentissage d'une première phrase et d'une deuxième phrase par un premier réseau neuronal et un deuxième réseau neuronal, respectivement ; l'acquisition d'une valeur de perte contrastive sur la base du fait qu'un premier vecteur de caractéristique et un deuxième vecteur de caractéristique, qui sont générés comme données de sortie d'apprentissage, sont identiques aux catégories auxquelles appartiennent la première et la deuxième phrase ; et la répétition de l'apprentissage de manière à augmenter au maximum la valeur de perte contrastive.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US16/613,317 US11568240B2 (en) | 2017-05-16 | 2018-05-16 | Method and apparatus for classifying class, to which sentence belongs, using deep neural network |
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201762506724P | 2017-05-16 | 2017-05-16 | |
| US62/506,724 | 2017-05-16 | ||
| KR10-2018-0055651 | 2018-05-15 | ||
| KR1020180055651A KR102071582B1 (ko) | 2017-05-16 | 2018-05-15 | 딥 뉴럴 네트워크(Deep Neural Network)를 이용하여 문장이 속하는 클래스(class)를 분류하는 방법 및 장치 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| WO2018212584A2 true WO2018212584A2 (fr) | 2018-11-22 |
| WO2018212584A3 WO2018212584A3 (fr) | 2019-01-10 |
Family
ID=64274189
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/KR2018/005598 Ceased WO2018212584A2 (fr) | 2017-05-16 | 2018-05-16 | Procédé et appareil de classification de catégorie à laquelle une phrase appartient à l'aide d'un réseau neuronal profond |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2018212584A2 (fr) |
Cited By (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110210027A (zh) * | 2019-05-30 | 2019-09-06 | 杭州远传新业科技有限公司 | 基于集成学习的细粒度情感分析方法、装置、设备及介质 |
| CN111310823A (zh) * | 2020-02-12 | 2020-06-19 | 北京迈格威科技有限公司 | 目标分类方法、装置和电子系统 |
| WO2021143018A1 (fr) * | 2020-01-16 | 2021-07-22 | 平安科技(深圳)有限公司 | Procédé, appareil et dispositif de reconnaissance d'intention et support d'enregistrement lisible par ordinateur |
| CN113168915A (zh) * | 2018-11-30 | 2021-07-23 | 普瑞万蒂斯技术有限公司 | 多通道和伴心律迁移学习 |
| CN113269303A (zh) * | 2021-05-18 | 2021-08-17 | 三星(中国)半导体有限公司 | 用于深度学习推理框架的数据处理方法和数据处理装置 |
| CN114021434A (zh) * | 2021-10-22 | 2022-02-08 | 中冶赛迪电气技术有限公司 | 一种改进dbn算法的电力系统故障后暂态稳定评估方法 |
| US20220108703A1 (en) * | 2020-01-23 | 2022-04-07 | Samsung Electronics Co., Ltd. | Electronic device and control method thereof |
Family Cites Families (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| GB2505400B (en) * | 2012-07-18 | 2015-01-07 | Toshiba Res Europ Ltd | A speech processing system |
| US10650805B2 (en) * | 2014-09-11 | 2020-05-12 | Nuance Communications, Inc. | Method for scoring in an automatic speech recognition system |
| US9646634B2 (en) * | 2014-09-30 | 2017-05-09 | Google Inc. | Low-rank hidden input layer for speech recognition neural network |
| KR102167719B1 (ko) * | 2014-12-08 | 2020-10-19 | 삼성전자주식회사 | 언어 모델 학습 방법 및 장치, 음성 인식 방법 및 장치 |
| KR102413693B1 (ko) * | 2015-07-23 | 2022-06-27 | 삼성전자주식회사 | 음성 인식 장치 및 방법, 그를 위한 모델 생성 장치 및 방법 |
-
2018
- 2018-05-16 WO PCT/KR2018/005598 patent/WO2018212584A2/fr not_active Ceased
Cited By (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113168915A (zh) * | 2018-11-30 | 2021-07-23 | 普瑞万蒂斯技术有限公司 | 多通道和伴心律迁移学习 |
| CN110210027A (zh) * | 2019-05-30 | 2019-09-06 | 杭州远传新业科技有限公司 | 基于集成学习的细粒度情感分析方法、装置、设备及介质 |
| WO2021143018A1 (fr) * | 2020-01-16 | 2021-07-22 | 平安科技(深圳)有限公司 | Procédé, appareil et dispositif de reconnaissance d'intention et support d'enregistrement lisible par ordinateur |
| US20220108703A1 (en) * | 2020-01-23 | 2022-04-07 | Samsung Electronics Co., Ltd. | Electronic device and control method thereof |
| CN114830229A (zh) * | 2020-01-23 | 2022-07-29 | 三星电子株式会社 | 电子设备及其控制方法 |
| EP4014232A4 (fr) * | 2020-01-23 | 2022-10-19 | Samsung Electronics Co., Ltd. | Dispositif électronique et son procédé de commande |
| US12211510B2 (en) * | 2020-01-23 | 2025-01-28 | Samsung Electronics Co., Ltd. | Electronic device and control method thereof |
| CN114830229B (zh) * | 2020-01-23 | 2025-11-28 | 三星电子株式会社 | 电子设备及其控制方法 |
| CN111310823A (zh) * | 2020-02-12 | 2020-06-19 | 北京迈格威科技有限公司 | 目标分类方法、装置和电子系统 |
| CN111310823B (zh) * | 2020-02-12 | 2024-03-29 | 北京迈格威科技有限公司 | 目标分类方法、装置和电子系统 |
| CN113269303A (zh) * | 2021-05-18 | 2021-08-17 | 三星(中国)半导体有限公司 | 用于深度学习推理框架的数据处理方法和数据处理装置 |
| CN114021434A (zh) * | 2021-10-22 | 2022-02-08 | 中冶赛迪电气技术有限公司 | 一种改进dbn算法的电力系统故障后暂态稳定评估方法 |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2018212584A3 (fr) | 2019-01-10 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| KR102071582B1 (ko) | 딥 뉴럴 네트워크(Deep Neural Network)를 이용하여 문장이 속하는 클래스(class)를 분류하는 방법 및 장치 | |
| WO2018212584A2 (fr) | Procédé et appareil de classification de catégorie à laquelle une phrase appartient à l'aide d'un réseau neuronal profond | |
| CN114090780B (zh) | 一种基于提示学习的快速图片分类方法 | |
| WO2018212494A1 (fr) | Procédé et dispositif d'identification d'objets | |
| CN117711001A (zh) | 图像处理方法、装置、设备和介质 | |
| Goyal | Indian sign language recognition using mediapipe holistic | |
| CN112233698A (zh) | 人物情绪识别方法、装置、终端设备及存储介质 | |
| CN112861945A (zh) | 一种多模态融合谎言检测方法 | |
| WO2021132797A1 (fr) | Procédé de classification d'émotions de parole dans une conversation à l'aide d'une incorporation d'émotions mot par mot, basée sur un apprentissage semi-supervisé, et d'un modèle de mémoire à court et long terme | |
| Goswami et al. | CNN model for american sign language recognition | |
| WO2020231005A1 (fr) | Dispositif de traitement d'image et son procédé de fonctionnement | |
| CN115588193A (zh) | 基于图注意力神经网络与视觉关系的视觉问答方法及装置 | |
| CN115691511A (zh) | 音频旋律识别模型的训练方法、音频处理方法及相关设备 | |
| CN113362852A (zh) | 一种用户属性识别方法和装置 | |
| Dabwan et al. | Arabic sign language recognition using EfficientnetB1 and transfer learning technique | |
| CN117421592A (zh) | 模型训练方法、文本分类方法、系统、设备及存储介质 | |
| Orosoo et al. | Enhancing English Learning Environments Through Real-Time Emotion Detection and Sentiment Analysis. | |
| CN114238587A (zh) | 阅读理解方法、装置、存储介质及计算机设备 | |
| Shania et al. | Translator of Indonesian Sign Language Video using Convolutional Neural Network with Transfer Learning | |
| Solanki et al. | Evaluating Multi-Layer Perceptron and Recurrent Neural Networks for Speech Emotion Recognition | |
| CN117789099A (zh) | 视频特征提取方法及装置、存储介质及电子设备 | |
| Allam et al. | Sign language recognition using CNN | |
| WO2025249857A1 (fr) | Système de service de génération automatique de questions basé sur des données non structurées | |
| Herath et al. | An approach to Sri Lankan sign language recognition using deep learning with MediaPipe | |
| Mallika et al. | Hand gesture recognition using convolutional neural networks |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18802102 Country of ref document: EP Kind code of ref document: A2 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 18802102 Country of ref document: EP Kind code of ref document: A2 |