WO2020056621A1 - 一种意图识别模型的学习方法、装置及设备 - Google Patents

一种意图识别模型的学习方法、装置及设备 Download PDF

Info

Publication number
WO2020056621A1
WO2020056621A1 PCT/CN2018/106468 CN2018106468W WO2020056621A1 WO 2020056621 A1 WO2020056621 A1 WO 2020056621A1 CN 2018106468 W CN2018106468 W CN 2018106468W WO 2020056621 A1 WO2020056621 A1 WO 2020056621A1
Authority
WO
WIPO (PCT)
Prior art keywords
skill
server
intent
data corresponding
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2018/106468
Other languages
English (en)
French (fr)
Inventor
张晴
杨威
肖一凡
张良和
芮祥麟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to EP18934368.4A priority Critical patent/EP3848855A4/en
Priority to US17/277,455 priority patent/US12079579B2/en
Priority to CN201880093483.0A priority patent/CN112154465B/zh
Priority to PCT/CN2018/106468 priority patent/WO2020056621A1/zh
Publication of WO2020056621A1 publication Critical patent/WO2020056621A1/zh
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning

Definitions

  • the present application relates to the field of communication technologies, and in particular, to a method, a device, and a device for learning an intention recognition model.
  • Human-computer dialogue system or human-machine dialogue platform, chatbot, etc.
  • human-machine dialogue platform is a new generation of human-machine interaction interface.
  • the human-computer dialogue system is divided into an open-domain chatbot and a task-oriented chatbot according to the field involved.
  • the chatbot oriented to specific tasks can implement the functions of providing services such as ordering meals, booking tickets, and taxis for end users.
  • the providers of these services enter some training data corresponding to function A in the server in advance (for example, user statements, which can also be called corpora).
  • the server uses the method of Ensemble Learning to train according to the input training data.
  • a model corresponding to function A is output.
  • the model corresponding to function A can be used to predict the new user statement input by the end user to determine the user's intention, that is, whether the server provides the service corresponding to function A to the end user.
  • the server uses some preset base learners (also called base models) to learn the training data input by the service provider, and uses some rules
  • the models trained by each base learner are integrated to obtain a more accurate model than a single base learner.
  • a learning method, device, and device provided by the present application can improve the accuracy of the intent recognition model in the human-machine dialogue system, the accuracy of the tasks performed by the human-machine dialogue system, and the user experience.
  • an embodiment of the present application provides a method for learning an intent recognition model.
  • the method includes: the server receives forward data corresponding to each intent in the first skill input by the skill developer; and the server corresponds to each intent in the first skill.
  • the positive data of the first skill generates negative data corresponding to each intention in the first skill;
  • the server obtains training data corresponding to the second skill, the second skill is a skill similar to the first skill, and the number of the second skill is at least one;
  • the server according to The training data corresponding to the second skill and the preset first base model are learned to generate a second base model, and the number of the second base model is at least one; the server according to the forward data corresponding to each intent in the first skill and the first base model.
  • the negative data corresponding to each intent in a skill and the second base model are learned to generate an intent recognition model.
  • the server generating the negative data corresponding to each intent in the first skill according to the positive data corresponding to each intent in the first skill includes: the server extracts the first skill separately for each intent in the first skill Keywords corresponding to each intent in the keywords, where the keywords are key features that affect the weight of the first base model; combining keywords corresponding to different intents in the first skill, or keywords corresponding to different intents with non-first Skill-related words are combined, and the combined words are determined as negative data corresponding to different intents.
  • the above keywords are the salient features that are most likely to affect the feature weight of classification (identification intent).
  • the cause of misidentification is that the salient features of other categories (other intentions) and the salient features of this category (this intention) are highly coincident.
  • Intent 1 is to open the app
  • Intent 2 is to close the app.
  • the "app" of intention 1 and intention 2 are salient features with high frequency, but not salient features of each classification.
  • the embodiments of the present application improve the accuracy by reducing the weight of the salient features contained in each classification (intent) during classification.
  • the server generates the negative data corresponding to each intent in the first skill according to the positive data corresponding to each intent in the first skill, and further includes: the server stores the training stored on the server according to the classification of the first skill.
  • a first set is determined in the full set of data, and the training data in the first set includes negative data corresponding to each intent in the first skill; wherein the training data includes positive data and negative data; sampling from the first set A preset amount of training data; manual labeling and / or clustering algorithms are used to determine negative data corresponding to each intent in the first skill from the training data obtained by sampling.
  • the server acquiring the data corresponding to the second skill includes: the server determining the second skill according to the classification of the first skill and / or the forward data of the first skill; and acquiring the second skill correspondence through the shared layer Training data.
  • the server learns according to the training data corresponding to the second skill and a preset first base model
  • generating the second base model includes: the server according to the training data corresponding to the second skill, and a preset The first base model is learned using a multi-task learning method to generate a second base model.
  • the server learns according to the positive data corresponding to each intent in the first skill and the negative data corresponding to each intent in the first skill, and the second base model, and generating the intent recognition model includes: The positive data corresponding to each intent in the first skill, the negative data corresponding to each intent in the first skill, and the second base model are learned using an integrated learning method to generate an intent recognition model.
  • the server generating the intent recognition model specifically includes: the server generates multiple intermediate models in the process of generating the intent recognition model; the server determines that the first intermediate model generated is the fastest intent recognition model; and / Or, the server determines the best intent recognition model after performing model selection and parameter adjustment on multiple intermediate models.
  • the embodiments of the present application provide multiple return mechanisms, such as returning the fastest generated intent recognition model, returning the best intent recognition model, and the like.
  • these intermediate models can also implement the function of the intent recognition model, so the first obtained intermediate
  • the model is determined as the intent recognition model returned to the voice skill developer or user, so that the voice skill development or user can obtain the intent recognition model as quickly as possible to understand the function, performance, etc. of the intent recognition model.
  • the server 300 may adjust the parameters of the multiple intermediate models obtained during the process of learning the intent recognition model to determine that the model with higher accuracy is the intent recognition model. To meet the various needs of users.
  • the server determining the best intent recognition model includes: the server calculates a first accuracy rate of each intermediate model according to the forward data corresponding to each intent in the first skill; the server according to each of the first skills The second accuracy rate of each intermediate model is calculated by the negative data corresponding to the intent; the server selects multiple intermediate models and adjusts parameters based on the first accuracy rate, the second accuracy rate, and the weight input by the skill developer. Determine the best intent recognition model.
  • the server to determine some corpus outside the voice skill, such as Automatically generated negative data to test the accuracy of the model. Because more test data is introduced to test the accuracy of the model, the test results are more accurate, and then the parameters in the model are adjusted according to the more accurate test results to obtain the best intent recognition model. That is, the technical solutions provided in the embodiments of the present application are beneficial to improving the accuracy of the intent recognition model.
  • an embodiment of the present application provides a server, including: a processor, a memory, and a communication interface; the memory is used to store computer program code, and the computer program code includes computer instructions.
  • the processor reads the computer instructions from the memory, causes the server to perform the following operations:
  • the server generating the negative data corresponding to each intent in the first skill according to the positive data corresponding to each intent in the first skill specifically includes: the server extracts the first data for each intent in the first skill. Keywords corresponding to each intent in the skill, where the keywords are key features that affect the weight of the first base model; combining keywords corresponding to different intents in the first skill, or combining keywords corresponding to different intents with non- A skill-related word is combined, and the combined word is determined as negative data corresponding to different intentions.
  • the server generating the negative data corresponding to each intent in the first skill according to the positive data corresponding to each intent in the first skill further includes: the server stores the data stored on the server from the server according to the classification of the first skill.
  • the first set of training data is determined in the full set, and the training data in the first set includes negative data corresponding to each intent in the first skill; wherein the training data includes positive data and negative data; from the first set, Sampling a preset amount of training data; using manual labeling and / or clustering algorithms to determine the negative data corresponding to each intent in the first skill from the training data obtained by sampling.
  • the server acquiring the data corresponding to the second skill includes: the server determining the second skill according to the classification of the first skill and / or the forward data of the first skill; and acquiring the second skill correspondence through the shared layer Training data.
  • the server learns according to the training data corresponding to the second skill and a preset first base model
  • generating the second base model includes: the server according to the training data corresponding to the second skill, and a preset The first base model is learned using a multi-task learning method to generate a second base model.
  • the server learns according to the positive data corresponding to each intent in the first skill and the negative data corresponding to each intent in the first skill, and the second base model, and generating the intent recognition model includes: The positive data corresponding to each intent in the first skill, the negative data corresponding to each intent in the first skill, and the second base model are learned using an integrated learning method to generate an intent recognition model.
  • the server generating the intent recognition model further specifically includes: the server generates multiple intermediate models in the process of generating the intent recognition model; the server determines that the first generated intermediate model is the fastest intent recognition model; and / Or, the server determines the best intent recognition model after selecting models and adjusting parameters of multiple intermediate models.
  • the server determining the best intent recognition model includes: the server calculates a first accuracy rate of each intermediate model according to the forward data corresponding to each intent in the first skill; the server according to each of the first skills The second accuracy rate of each intermediate model is calculated by the negative data corresponding to the intent; the server selects multiple intermediate models and adjusts parameters based on the first accuracy rate, the second accuracy rate, and the weight input by the skill developer. Determine the best intent recognition model.
  • a computer storage medium includes computer instructions, and when the computer instructions are run on a server, the server is caused to execute the method as described in the first aspect and any one of its possible implementation manners.
  • a computer program product when the computer program product runs on a computer, causes the computer to perform the method as described in the first aspect and any one of its possible implementations.
  • FIG. 1 is a first schematic structural diagram of a human-machine dialogue system according to an embodiment of the present application
  • FIG. 2 is a schematic diagram of a learning method of an intent recognition model in the prior art
  • FIG. 3 is a schematic diagram of a usage scenario of an intent recognition model according to an embodiment of the present application.
  • FIG. 4 is a second structural schematic diagram of a human-machine dialogue system provided by an embodiment of the present application.
  • FIG. 5 is a first schematic diagram of a learning method for an intent recognition model according to an embodiment of the present application.
  • FIG. 6 is a second schematic diagram of a learning method for an intent recognition model according to an embodiment of the present application.
  • FIG. 7 is a third schematic diagram of a learning method for an intent recognition model according to an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a server according to an embodiment of the present application.
  • FIG. 9 is a schematic flowchart of a learning method for an intent recognition model according to an embodiment of the present application.
  • FIG. 10 is a fourth schematic diagram of a learning method for an intent recognition model according to an embodiment of the present application.
  • first and second are used for descriptive purposes only and cannot be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Therefore, the features defined as “first” and “second” may explicitly or implicitly include one or more of the features. In the description of the embodiments of the present application, unless otherwise stated, the meaning of "a plurality" is two or more.
  • FIG. 1 it is a schematic diagram of a composition of a human-machine dialogue system provided by an embodiment of the present application.
  • the human-machine dialogue system includes: one or more electronic devices 100, one or more servers 200, and one or more servers 300.
  • a communication connection is established between the electronic device 100 and the server 300.
  • the server 300 establishes communication connections with the electronic device 100 and the server 200, respectively.
  • the electronic device 100 may also establish a communication connection with the server 200.
  • the communication connection may be established by a telecommunication network (a communication network such as 3G / 4G / 5G) or a WIFI network, which is not limited in the embodiment of the present application.
  • a telecommunication network a communication network such as 3G / 4G / 5G
  • WIFI network which is not limited in the embodiment of the present application.
  • the electronic device 100 may be a mobile phone, a tablet computer, a personal computer (PC), a personal digital assistant (PDA), a smart watch, a netbook, a wearable electronic device, or an augmented reality technology.
  • the server 200 may be a server of a third-party application for providing a service of the third-party application.
  • the third-party application may be, for example, Meituan, Amazon, Didi Taxi and other applications.
  • the server 300 may be a server of a manufacturer of the electronic device 100, for example, a cloud server of a voice assistant in the electronic device 100, and the server 300 may also be another server, which is not limited in the embodiment of the present application.
  • the voice skills may mean that the user can request the third-party application to provide the function of one or more services in the application through the interactive interaction between the electronic device 100 and the server 200 of the third-party application.
  • This conversational interaction process simulates the interaction scenario of the user's actual life, making the interaction between the user and the electronic device as natural as interacting with people.
  • each sentence spoken by the user corresponds to an intent, which is the purpose of the user to say the sentence.
  • each voice skill is composed of several intents.
  • the server 300 understands the user's needs and provides corresponding services by matching each sentence spoken by the user with the intent in the voice skills, such as Provides services such as ordering meals, booking tickets, and taxis.
  • the above-mentioned process of matching the user's words and the intent in voice skills is the process of intent recognition.
  • the server 300 can construct a model for intent recognition.
  • the intent recognition model can automatically identify the user based on the user's input of the user.
  • the argument corresponds to the intent.
  • FIG. 2 it is a schematic diagram of a process of training an intent recognition model by the server 300.
  • a developer of a voice skill such as the server 200
  • the server 300 may obtain corresponding intent recognition models corresponding to the new voice skills by using corresponding learning methods, such as integrated learning methods, based on the training data.
  • the corpus A corresponding to the speech skill 1 is input into the model learning framework 1 for learning, and an intent recognition model corresponding to the speech skill 1 is obtained.
  • the model learning framework 1 may adopt, for example, an integrated learning framework.
  • the smallest learning unit in the model learning framework is a base model.
  • the integrated learning framework includes a plurality of base models and is referred to as the base model 1.
  • These base models 1 are different.
  • different base models 1 can belong to different models.
  • they can be support vector machines (SVM) and logistic regression (LR).
  • Model can also belong to the same model, but use different hyperparameters (for example: SVM support vector machine models with different hyperparameter C values); or they can belong to the same model but use different data (for example, the same SVM model Use a different subset of the original data, etc.).
  • the electronic device 100 After the training of the intent recognition model corresponding to the new voice skills is completed, when the electronic device 100 receives the user-entered statement, it sends the user-entered statement to the server 300, and the server 300 according to the user's statement and the training intention
  • the recognition model performs predictions to determine the user's intention corresponding to the user's statement, and then determines the third-party application that the user provides services and specific functions in the third-party application.
  • FIG. 3 a schematic diagram of a process in which the server 300 determines to provide a third-party application service to a user.
  • the server 300 determines to provide a third-party application service to a user.
  • the server 300 also stores the intent recognition model corresponding to the voice skill 2 of the third party application 1, An intent recognition model corresponding to the speech skill 3 of 2 and the like.
  • the user inputs the voice 1 through the electronic device 100, and the electronic device 100 may convert the voice 1 into corresponding text, that is, the user statement 1, and send it to the server 300, or the electronic device 100 may directly send the voice 1 to the server 300,
  • the server 300 converts voice 1 into text (user statement 1).
  • the server 300 distributes the user's calligraphy 1 to the intent recognition model corresponding to each voice skill, and performs calculations based on the intent recognition model corresponding to each voice skill to determine that the user statement 1 corresponds to the voice skill 1, that is, a third party needs to be provided for the user The service corresponding to the applied voice skill 1.
  • the server 300 may return the information of the determined third-party application to the electronic device 100, and the information of the third-party application may include the address of the server of the third-party application. Information, the information of the third-party application may also include information of specific services that need to be provided to the user, such as: ordering meals or checking orders. Then, the electronic device 100 may establish a communication connection with the server 200 of the third-party application, and the server 200 of the third-party application provides a corresponding service for the user. In other embodiments, the server 200 may establish a communication connection with the server 200 of the corresponding third-party application, and send the service request information of the user to the server 200 of the third-party application.
  • the server 200 of the third-party application may pass the server 300 Interact for the electronic device 100 and provide corresponding services to the user.
  • the service request information may include information of a specific service requested by a user, for example, ordering a meal or checking an order.
  • the embodiment of the present application does not limit the specific manner in which the server 200 of the third-party application provides services for the user of the electronic device 100.
  • FIG. 4 it is a schematic diagram of another human-machine dialogue system provided by an embodiment of the present application.
  • the human-machine dialogue system includes: one or more electronic devices 100 and one or more servers 200.
  • the electronic device 100 and the server 200 may refer to the related description in FIG. 1.
  • the process of learning the intent recognition model corresponding to the new speech skill according to the input of training data corresponding to the new speech skill by the speech skill developer can also be performed on the server 200 or the operation of the electronic device 100 When the function can support the calculation of the process, it can also be executed on the electronic device 100.
  • the method provided in the embodiment of the present application does not limit the execution subject of the training intent recognition model. If the server 200 of the third-party application executes the model for training intent recognition, then the server 200 of the third-party application trains the intent-recognition model for the third-party application's multiple voice skills. The following description is made by taking the training intent recognition model executed by the server 300 as an example.
  • the target model trained by the server 300 cannot be guaranteed, which will further affect the accuracy of the server 300's recognition of the user's intention and the user experience of the electronic device.
  • the base model preset by the server 300 will inevitably have some unreasonable situations, for example: some scenarios or types of voice skills are not applicable, which will also affect the training of the server 300 The accuracy of the target model.
  • the server 300 can provide a common human-machine dialogue platform for multiple third-party applications.
  • the developer or maintainer of the human-machine dialogue platform can set some basic skill templates in advance on the human-machine dialogue platform. (Including some preset training data, base models used in learning, etc.), these skill templates cover some commonly used scenarios. Third-party applications can then modify these basic skill models to achieve their own personalized needs, or third-party applications can add custom skills based on their service content.
  • the technical solutions provided in the embodiments of the present application can be applied to the process of the server 300 learning the intent recognition model corresponding to the new voice skills, which can improve the accuracy of the intent recognition model learned by the server 300 based on the existing technology. Sex.
  • FIG. 5 it is a schematic diagram of a method for learning an intent recognition model according to an embodiment of the present application.
  • the server 300 may automatically generate some other training data according to the training data corresponding to the input new voice skills. Then, the training data input by the voice skill developer and other automatically generated training data are input to the server 300 together, and the server 300 performs learning. Since more training data is introduced in the learning process, the accuracy of the intent recognition model obtained by the server 300 after learning can be improved.
  • the input training data of speech skill 1 is corpus A.
  • the server generates negative data based on the positive data in corpus A, that is, corpus B, and then inputs corpus A and corpus B into the model learning framework for learning.
  • the intent recognition model corresponding to the speech skill 1 is obtained.
  • the forward data is data of a user statement that can trigger the server 300 to perform a corresponding operation.
  • the negative data is similar to the positive data, but is not some data of the positive data, that is, the data of the user saying that the server 300 does not perform the corresponding operation.
  • the server 300 may learn the intent recognition models corresponding to different voice skills of the same third-party application, and may also learn the intent recognition models corresponding to different voice skills of different third-party applications. That is, the server 300 may store intent recognition models for different voice skills of the same third-party application, and intent recognition models corresponding to different voice skills of different third-party applications. Considering that the server 300 can store intent recognition models corresponding to various types of speech skills of various topics, there will be some similarities between these large numbers of speech skills, then the corpus corresponding to these speech skills can also be reused. Therefore, the technical solution provided in the embodiment of the present application can also be used for training the intent recognition model corresponding to a certain voice skill in combination with training data of other voice skills similar to the voice skill.
  • the server 300 may expand the base model in the model learning framework according to training data of other speech skills similar to the speech skill.
  • other voice skills similar to the voice skill may belong to the same third-party application as the voice skill, or may belong to different third-party applications that are different from the voice skill. There are no restrictions on types and quantities.
  • FIG. 6 it is a schematic diagram of another process for learning an intent recognition model according to an embodiment of the present application.
  • the server 300 determines that the speech skill 2 and the speech skill 3 are other speech skills similar to the speech skill 1 as an example for description.
  • the server 300 may input the corpus C corresponding to the speech skill 2 into the model learning framework 2 to learn one or more base models 2, and the server 300 may input the corpus D corresponding to the speech skills 3 into the model learning framework 2 and learn to obtain one or more Multiple base models 2.
  • the model learning framework 2 includes one or more base models 1 stored in advance by the server 300. According to the corpus C and the corpus D, and the multiple base models 2 obtained by the model learning framework 2, that is, the base models obtained after the server 300 is expanded, can be used as the base models in the model learning framework 3.
  • the model learning framework 2 may adopt, for example, a multi-task learning framework, and the model learning framework 3 may adopt, for example, an integrated learning framework.
  • the server 300 may input the corpus A corresponding to the voice skill 1 input by the skill developer, and the automatically generated corpus B into the model learning framework 3 for learning, and obtain an intent recognition model corresponding to the voice skill 1.
  • the server 300 may also directly input the corpus A corresponding to the speech skill 1 input by the skill developer into the model learning framework 3 for learning, and obtain an intent recognition model corresponding to the speech skill 1. This embodiment of the present application does not limit this.
  • FIG. 7 it is a schematic diagram of a process in which a server 300 learns an intent recognition model according to an embodiment of the present application. Specifically, during the process of learning the intent recognition model by the server 300, a plurality of intermediate models are successively obtained. The server 300 performs model selection and parameter adjustment on these intermediate models to obtain the finally generated intent recognition model with the highest accuracy. It should be noted that these intermediate models can all realize the function of the intent recognition model corresponding to the speech skill 1. Considering the different needs of developers of speech skills, the embodiments of the present application provide multiple return mechanisms, such as returning the fastest intent recognition model and returning the best intent recognition model.
  • the server 300 when returning the fastest generated intent recognition model, since the server 300 obtains multiple intermediate models in the process of learning the intent recognition model, these intermediate models can also implement the function of the intent recognition model, so the first obtained intermediate The model is determined as the intent recognition model returned to the voice skill developer or user, so that the voice skill development or user can obtain the intent recognition model as quickly as possible to understand the function, performance, etc. of the intent recognition model.
  • the server 300 may adjust the parameters of the multiple intermediate models obtained during the process of learning the intent recognition model to determine that the model with higher accuracy is the intent recognition model.
  • the embodiment of the present application when training an intent recognition model corresponding to a certain speech skill, combined with training data of other speech skills similar to the speech skill, the embodiment of the present application also provides a method for determining the most The standard of a good model, that is, not only considering the corpus in the speech skill, such as: input training data of the skill developer, testing the accuracy of the model, but also considering using the server to determine some corpus outside the speech skill, such as Automatically generated negative data to test the accuracy of the model. Because more test data is introduced to test the accuracy of the model, the test results are more accurate, and then the parameters in the model are adjusted according to the more accurate test results to obtain the best intent recognition model. That is, the technical solutions provided in the embodiments of the present application are beneficial to improving the accuracy of the intent recognition model. The specific implementation process will be described in detail below.
  • FIG. 8 it is a schematic diagram of a hardware structure of a server 300 according to an embodiment of the present application.
  • the server 300 includes at least one processor 301, at least one memory 302, and at least one communication interface 303.
  • the server 300 may further include an output device and an input device, which are not shown in the figure.
  • the processor 301, the memory 302, and the communication interface 303 are connected through a bus.
  • the processor 301 may be a general-purpose central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more of the programs used to control the execution of the program of the solution of this application. integrated circuit.
  • the processor 301 may also include multiple CPUs, and the processor 301 may be a single-CPU processor or a multi-CPU processor.
  • a processor herein may refer to one or more devices, circuits, or processing cores for processing data (such as computer program instructions).
  • the processor 301 may be specifically configured to automatically generate negative data of new voice skills according to training data of new voice skills input by a voice skill developer.
  • the processor 301 may also be specifically configured to determine other speech skills similar to the new speech skills, acquire data of other speech skills, and extend the base model in the model learning framework.
  • the processor 301 may also learn based on the expanded base model and the training data input by the developer, and the negative data automatically generated by the processor 301 to obtain an intent recognition model.
  • the processor 301 may also be specifically configured to adjust a plurality of intermediate models generated during the learning process, so as to generate an optimal intent recognition model and the like.
  • the memory 302 may be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, a random access memory (Random, Access Memory, RAM), or other types that can store information and instructions
  • the dynamic storage device can also be Electrically Erasable Programmable Read-Only Memory (EEPROM), Compact Disc (Read-Only Memory, CD-ROM) or other optical disk storage, optical disk storage (Including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), disk storage media or other magnetic storage devices, or can be used to carry or store desired program code in the form of instructions or data structures and can be used by a computer Any other media accessed, but not limited to this.
  • EEPROM Electrically Erasable Programmable Read-Only Memory
  • CD-ROM Compact Disc
  • CD-ROM Compact Disc
  • optical disk storage including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.
  • disk storage media or other magnetic storage devices
  • the memory 302 may exist independently, and is connected to the processor 301 through a bus.
  • the memory 302 may also be integrated with the processor 301.
  • the memory 302 is configured to store application program code that executes the solution of the present application, and is controlled and executed by the processor 301.
  • the processor 301 is configured to execute computer program code stored in the memory 302, so as to implement the data transmission method described in the embodiment of the present application.
  • the memory 302 may be used to store data of a base model preset in a learning framework in the server 300, and may also be used to store negative data corresponding to each voice skill automatically generated by the server 300.
  • the processor 301 Each intermediate model generated in the training, and various data used for training or testing, and the intent recognition model corresponding to each learned speech skill.
  • the communication interface 303 can be used to communicate with other devices or communication networks, such as Ethernet, wireless local area networks (WLAN), and the like.
  • devices or communication networks such as Ethernet, wireless local area networks (WLAN), and the like.
  • the communication interface 303 may be specifically configured to communicate with the electronic device 100 to implement interaction with a user of the electronic device.
  • the communication interface 303 may also be specifically configured to communicate with the server 200 of the third-party application.
  • the server 300 may receive training data corresponding to the new voice skills input by the service 200 of the third-party application, or the server 300 may send the determined user
  • the service request is sent to the server 200 of the third-party application, so that the server 200 of the third-party application provides a corresponding service and the like for the user of the electronic device.
  • the output device communicates with the processor and can display information in a variety of ways.
  • the output device may be a liquid crystal display (Liquid Crystal Display, LCD), a light emitting diode (Light Emitting Diode, LED) display device, a cathode ray tube (Cathode Ray Tube, CRT) display device, a projector, etc.
  • the input device communicates with the processor and can receive user input in a variety of ways.
  • the input device may be a mouse, a keyboard, a touch screen device, or a sensing device.
  • a schematic flowchart of a method for learning an intent recognition model includes:
  • the server receives forward data corresponding to the first skill input by the skill developer.
  • the skill developer may be a developer of a third-party application or a manufacturer of an electronic device, which is not limited in the embodiment of the present application.
  • the server provides a skill development platform for skill developers.
  • Skill developers can log in to the skill developers through telecommunications networks (communication networks such as 3G / 4G / 5G) or WIFI networks.
  • telecommunications networks communication networks such as 3G / 4G / 5G
  • WIFI networks WIFI networks.
  • the skill developer can enter basic information such as the name of the first skill, the classification of the first skill, and then enter the intention contained in the first skill and the forward data corresponding to each intent.
  • the forward data corresponding to each intention includes: the server needs to perform a corresponding operation after receiving the forward data, for example, triggering a corresponding service, or returning a corresponding conversation to an electronic device.
  • the data that can trigger the server to perform the corresponding operation is called forward data, which can also be called forward corpus, positive example, positive example data, in-skill data, and so on.
  • the skill developer may also input some negative data corresponding to each intent when inputting positive data corresponding to each intent of the first skill.
  • the negative data is similar to the positive data, but not the positive data.
  • the negative data can also be referred to as negative corpus, negative example, negative example data, out-of-skill data, etc., which are not limited in the embodiment of the present application.
  • the skill developer is the Meituan application
  • the first skill developed by the Meituan application is ordering services.
  • the Meituan application can input forward data corresponding to the ordering service, for example: ordering takeaway, ordering, etc.
  • the Meituan application can also enter negative data, such as booking air tickets.
  • the server generates negative data corresponding to the first skill according to the positive data corresponding to the first skill.
  • the first skill includes multiple intents, and each intent may correspond to one or more forward data.
  • the server can generate corresponding negative data based on the positive data for each intent. In this way, it is beneficial to improve the accuracy of the intent recognition model trained later.
  • the criterion is only the threshold set by the skill developer.
  • the threshold when the confidence of a statement is lower than the threshold, the statement is not considered to be the intention in the first skill.
  • the confidence of a statement reaches the threshold, the statement is considered to be the intention within the first skill.
  • this threshold is difficult to set.
  • other types of training data are introduced, for example, the negative data introduced may also be used to perform category discrimination, which is helpful to avoid some misidentification situations and to improve the accuracy of the intent recognition model obtained through training.
  • the embodiments of the present application provide two methods for generating the negative data corresponding to the first skill according to the positive data corresponding to the first skill, as follows:
  • Method 1 For each intent in the first skill, the server extracts the keywords of each intent from the forward data corresponding to each intent, and combines the extracted keywords with other unrelated words to form each intent correspondence. Negative data.
  • the above keywords are the salient features that are most likely to affect the feature weight of classification (identification intent).
  • the cause of misidentification is that the salient features of other categories (other intentions) and the salient features of this category (this intention) are highly coincident.
  • Intent 1 is to open the app
  • Intent 2 is to close the app.
  • the "app" of intention 1 and intention 2 are salient features with high frequency, but not salient features of each classification.
  • the embodiments of the present application improve the accuracy by reducing the weight of the salient features contained in each classification (intent) during classification.
  • the keywords in the above-mentioned extracted forward data may be manually labeled by a skill developer, or may be extracted by using a related mining technique of sampling keywords.
  • the keywords in the forward data are also a feature in the forward data, so feature selection methods can be used for extraction, such as the LR (logistic regression) method based on L1 regularization.
  • the embodiments of the present application may also use a wrapper (feature selection) method, such as: LVW (Las Vegas), sparse representation method based on dictionary learning, and simple heuristic algorithm (Heuristic Algorithm).
  • LVW Las Vegas
  • Sparse representation method based on dictionary learning
  • simple heuristic algorithm simple heuristic algorithm
  • the process of extracting keywords of forward data is briefly introduced.
  • K categories that is, the forward data input by the user corresponds to K intents, that is, one intent can be understood as a classification
  • K classifiers of "1 vs the rest" are constructed respectively.
  • the L1 norm can make features that have little effect on classification weight ignored, thus retaining features that have influence on classification.
  • the features are 0/1 discrete features. You can obtain significant features by filtering and sorting through the weight symbol and absolute value size, that is, the weight for each intent. The biggest keywords.
  • the irrelevant words may be combined with the extracted keywords to form words (or word combinations or sentences, etc.) that are similar in form to the original forward data but have semantic deviations, or are not related to the semantics of the original forward data.
  • Words or combinations of words or sentences, etc.
  • the method of sentence similarity can be used to determine whether the combined words (or word combinations or sentences, etc.) deviate from or are irrelevant to the original forward data semantics.
  • the server may randomly combine keywords corresponding to different intents to generate negative data.
  • the server may also randomly combine some keywords that are not related to the first skill, such as entity words, to generate negative data.
  • entity words such as entity words.
  • the embodiment of the present application does not limit the method for the server to select irrelevant words, and the manner of combining each intended keyword with other irrelevant words is not limited.
  • the forward data of a certain skill contains "open menu” and “close menu”, which respectively correspond to two intents, intent one and intent two.
  • the keywords of the positive data it can be obtained that the keywords of the intent one are "open” and the keywords of the intent two are "close”.
  • “open and close” obtained by combining keywords with different intentions
  • “open drawer” and “close WeChat” obtained by combining each intentional keyword with other irrelevant words constitute the negative of this skill ⁇ ⁇ To the data.
  • the server stores a large amount of training data of voice skills, including data of different voice skills of the same third-party application, and data of different voice skills of different third-party applications. These massive training data can belong to different fields and different classifications, and constitute a complete set of training data on the server.
  • the server may determine the first set from the full set of training data stored on the server according to the classification of the first skill, where the corpus in the first set is likely to conflict with the forward data of the first skill Corpus collection.
  • a method based on an in-class classifier or a similarity search method such as an inverted index, sentence similarity calculation, etc., may be used to determine a corpus that conflicts with forward data, or determine Suspicious corpus.
  • the server may directly determine the first set from the full set of training data, or may gradually determine the first set from the full set of training data, that is, the server may first A full set of training data determines a set A, where set A is smaller than the full set of training data.
  • a set B is determined in the set A, the set B is smaller than the set A, and so on until the first set is determined.
  • the embodiment of the present application does not limit the specific method for the server to determine the first set.
  • the corpus in the first set may be a corpus that is similar to the forward data, for example, it contains keywords that are the same as or similar to the forward data, or contains entity words that are the same or similar to the forward data (for example, nouns). .
  • the forward data is "open menu", assuming that the keyword of the corpus is "open”.
  • the first set may include, for example, "turn on the radio”, “turn on the menu”, “hide the menu”, and the like.
  • the server may sample a certain number of corpora from the first set, and further determine whether the corpus used is negative data corresponding to the first skill.
  • the above-mentioned certain number (that is, the number of sampled corpora) can be determined according to the number of forward data.
  • the number of sampled corpora can be determined to be the same as the number of forward data, or it can be the maximum number of forward data corresponding to each intent, or the average number of corpora, or a preset number, etc.
  • the embodiment of the present application does not specifically limit the number of sampling corpora.
  • a random sampling method or a sampling method according to a theme distribution may be adopted.
  • the embodiment of the present application does not limit the specific sampling method.
  • the server determines whether the sampled corpus is negative data of the first skill, it can also be understood as classifying the sampled corpus. If the semantics of the sampled corpus are the same as the positive data, the data is positive. If it is different, the data is negative. .
  • the sample corpus is negative data of the first skill, it can be defined manually or iteratively by using a clustering algorithm and manual iteration.
  • the clustering algorithm may be hierarchical clustering or a text clustering classification method based on a topic model.
  • the clustering algorithm and manual iteration can be, for example, manually labeling some features that the clustering algorithm cannot recognize, such as the vividness of colors, etc. For suspicious corpora that cannot be determined by the clustering algorithm, they are manually confirmed Whether it is negative data.
  • the clustering algorithm can also learn the rules of manual labeling, and then use this rule to determine whether it is negative data for other sampled corpora.
  • the server when the server generates negative data according to the positive data, it can use any one of the above methods, or use both methods, or use other methods.
  • the embodiments of the present application are not limited to using the positive method. Data generation method of negative data.
  • the server determines a second skill similar to the first skill, and the second skill is at least one.
  • the server can share data between different speech skills.
  • the server may share data of each voice skill after obtaining the consent of each voice skill developer.
  • the sharing method may be, for example, sharing based on classification. For example, different voice skills in the same category may share data, and different voice skills in different categories may not share data.
  • the server may add a sharing layer between voice skills that can share data, and different voice skills may obtain data of other voice skills through the sharing layer.
  • the server may also have some built-in data that can be used by different speech skills.
  • the server learns the intent recognition model corresponding to the new speech skill, if the speech skill developer does not input the speech skill When the corresponding training data or input training data is small, the server can also learn by sharing data of other voice skills and the data built in the server to obtain the intent recognition model.
  • the server may determine other second skills similar to the first skill on the server according to, for example, the classification of the first skill, or the similarity between the training data of the first skill and the training data of other skills.
  • the second skill may be other voice skills developed by the developer of the first skill, or may be other voice skills developed by the developer of other voice skills.
  • the embodiment of the present application does not limit the extraction method and the number of the second skills. .
  • the server obtains data corresponding to each second skill.
  • the server can obtain training data corresponding to each second skill through the shared layer, which may include positive data and negative data, and may specifically include positive data and negative data input by the developer of the second skill, and may also include The server automatically generates negative data and the like in the process of learning the intent recognition model or other models corresponding to the second skill.
  • the server generates a second base model according to the data corresponding to the second skill and the first base model stored by the server.
  • the model learning framework set in advance includes one or more first-based models used, and the organizational relationship between each first-based model (may include each first-based model) Weights, hierarchical relationships) and so on.
  • the server inputs the corpus corresponding to each second skill into the first base model and learns into multiple second base models.
  • a multi-task learning method may be adopted.
  • the multi-task learning method may be based on deep learning convolutional neural networks (CNNs), long-term short-term memory networks (LSTMs) And so on.
  • CNNs deep learning convolutional neural networks
  • LSTMs long-term short-term memory networks
  • adversarial training can also be added in the process of multi-task learning, which can further achieve the purification of shared data, improve the accuracy of the second base model, and further improve the accuracy of the intent recognition model.
  • the first skill is a speech skill 1 and the input training data of the first skill is a corpus A as an example for description.
  • the server determines that speech skills 1 similar to speech skills include speech skills 2 and 3.
  • the training data corresponding to the speech skill 2 is corpus C
  • the training data corresponding to the speech skill 3 is corpus D.
  • the server inputs the corpus C into the model learning framework 2 for learning, where the model learning framework 2 includes one or more first base models, that is, base models 1.
  • the model learning framework 2 may be, for example, a multi-task learning framework.
  • the corpus C is input into the multi-task learning framework to generate a plurality of second base models, that is, a plurality of base models 2.
  • the server is equivalent to task learning for speech skills 2.
  • the corpus D is input into a multi-task learning framework to generate a plurality of second base models.
  • the server is equivalent to task learning for voice skills 3.
  • the server when learning the intent recognition model corresponding to the speech skill 1, the server also introduces training data of other speech skills (speech skill 2 or speech skill 3), and performs multi-task joint training on the training data of other speech skills.
  • a second base model with different representation spaces is generated.
  • the second base model with different representation space is input into the integrated learning framework for learning, and finally an intent recognition model is obtained.
  • the technical solutions provided in the embodiments of the present application are more conducive to improving the accuracy of the intent recognition model corresponding to the learned speech skills. Sex.
  • step S106 The server learns according to the positive data and the negative data corresponding to the first skill, and the second base model. If it is determined that the fastest intent recognition model is returned according to the policy, step S107 is performed, and if it is determined that the best intent recognition model is returned according to the policy, step S108 is performed.
  • the above strategy may be a strategy preset by the server, that is, which model is returned, or two models; or a strategy recommended by the server to the user, that is, a different model may be recommended to the user according to different situations. It is also possible that the server receives the return policy input by the user and determines to return the corresponding model. The embodiment of the present application does not limit the policy.
  • the server may adopt an integrated learning method for learning, for example, a stacking integrated learning method. During the server learning process, some intermediate models will be generated one after another.
  • FIG. 10 it is a schematic diagram of another process for learning an intent recognition model provided by an embodiment of the present application.
  • the server inputs positive data and negative data corresponding to the first skill into the stacking integrated learning framework, learns the intermediate model of the first layer, and records it as intermediate model 1. Then, the server inputs the features obtained from the intermediate model of the first layer into the stacking integrated learning framework, learns the intermediate model of the second layer, and so on until the last intention recognition model is generated.
  • the server determines the fastest intent recognition model.
  • the server will successively generate a plurality of intermediate models, and these intermediate models can independently implement the function of the intent recognition model corresponding to the first skill. Therefore, based on the needs of the skill developer, the server can return some intermediate models as the fastest generated models first. In some embodiments, the server may determine and return the first generated intermediate model as the fastest intent recognition model. As shown in FIG. 10, the service may determine that the first intermediate model generated first is the fastest intent recognition model. In other embodiments, the server may determine and return the model generated by the specified method, to identify the model for the fastest intent. In this way, the skill developer can obtain the intent recognition model as quickly as possible to understand the function, performance, etc. of the intent recognition model.
  • the server determines the best intent recognition model.
  • the intent recognition model finally obtained is the best intent recognition model. This is because during the learning process of the server, model selection and parameter adjustment are performed according to the multiple intermediate models generated successively, so that the finally obtained intent recognition model has the highest accuracy rate.
  • a network search (grid search) method or a Bayesian optimization (Bayesian optimization) method can be used for parameter tuning.
  • the parameter tuning in the prior art is optimized based on the training data corresponding to the first skill input by the first skill developer, so the accuracy of the intent recognition model obtained in the end only reflects the use of The accuracy of the data in the first skill when making predictions.
  • the server uses the first skill in addition to the data in the first skill (such as the forward data of the first skill input by the first skill developer) when training the intent recognition model.
  • External data for example: automatically generated negative data of the first skill, and when the base model is expanded, data of the second skill similar to the first skill is also used, etc.).
  • the server when the server selects the best intent recognition model, it needs to consider the data within the first skill (for example: forward data of the first skill) and the data outside the first skill (for example: service Automatically generated negative data for the first skill).
  • the first is the accuracy rate predicted by the data in the first skill, that is, the accuracy rate of the first skill data.
  • the prediction When using the data in the first skill to make a prediction, the prediction is positive (that is, it can be identified as an intention to call the first skill), and the prediction is correct.
  • the second is the accuracy rate predicted by the data outside the first skill, that is, the accuracy rate of the data outside the first skill.
  • the prediction When using data other than the first skill to predict, the prediction is negative (that is, the intent that the first skill cannot be called) is correct.
  • the false recall rate of the data outside the first skill may also be used to reflect the accuracy of prediction using the data outside the first skill.
  • the lower the false recall rate of the data outside the first skill the higher the accuracy of the prediction using the data outside the first skill.
  • the accuracy rate of the first skill data is inversely proportional to the accuracy rate of the data outside the first skill, and is directly proportional to the false recall rate of the data outside the first skill. In other words, the accuracy of the first skill data is high, and the false recall rate of the first skill is also high. Therefore, the embodiment of the present application provides a method for evaluating the best intent recognition model.
  • the server may set a confidence level of the first skill, and the confidence level may be set by a user (for example, a developer of the first skill).
  • the parameter of the confidence level can reflect the relative requirements of the user on the two indexes of the accuracy of the data within the first skill and the false recall rate of the data outside the first skill. For example, the greater the confidence level, the lower the false recall rate that the user wants the data outside the first skill, that is, the higher the prediction accuracy of the data outside the first skill. Therefore, the following formula 1 can be used to evaluate the best intent recognition model, as follows:
  • score is the score for evaluating the best intent recognition model. The higher the score, the higher the accuracy of the intent recognition model is, and the more it meets user requirements.
  • accuracyIn is the average accuracy of the data within the first skill.
  • accuracyOut is the average accuracy rate based on data outside the first skill.
  • C is the value of the confidence level set by the user, C is greater than or equal to zero and less than or equal to 1.
  • K-fold cross-validation method When the server calculates the average accuracy rate of the data within the first skill and the average accuracy rate of the data outside the first skill, for example, a K-fold cross-validation method may be used.
  • the specific method of K-fold cross-validation can be referred to the prior art, and is not repeated here.
  • the server can also set a parameter to control the confidence level set by the user. Then, the following formula 2 can be used to evaluate the best intent recognition model, as follows:
  • P is a parameter set by the server. P is greater than or equal to zero and less than or equal to 1.
  • the P value can control the degree of influence of the confidence level set by the user on the score. The larger the value of P, the greater the impact of the confidence level allowed by the user on the score. The smaller the P value, the smaller the confidence that the user can set to affect the score.
  • the server can share some common features of the intermediate model through the shared layer to reduce the amount of calculation and improve the efficiency of model training.
  • the fastest or best intent recognition model After learning the fastest or best intent recognition model, it can be used to predict a new user statement input by the user of the electronic device, and determine the user's intention in order to provide the user with corresponding services.
  • the human-machine dialogue system shown in FIG. 1 may deploy the fastest or best intent recognition model learned on the server 300 or the server 200.
  • the server 300 may store intent recognition models corresponding to different voice skills of multiple third-party applications (for example, third-party application 1 and third-party application 2).
  • the server 200 may store intent recognition models corresponding to different voice skills of the third-party application.
  • the human-machine dialogue system shown in FIG. 4 may deploy the fastest or best intent recognition model learned on the server 200, and the server 200 may store intent recognition corresponding to different voice skills of the third-party application 1 model.
  • the disclosed apparatus and method may be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the modules or units is only a logical function division.
  • multiple units or components may be divided.
  • the combination can either be integrated into another device, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, which may be electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may be one physical unit or multiple physical units, that is, may be located in one place, or may be distributed to multiple different places. . Some or all of the units may be selected according to actual needs to achieve the objective of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each of the units may exist separately physically, or two or more units may be integrated into one unit.
  • the above integrated unit may be implemented in the form of hardware or in the form of software functional unit.
  • the integrated unit When the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a readable storage medium.
  • the technical solution of the embodiments of the present application is essentially a part that contributes to the existing technology or all or part of the technical solution may be embodied in the form of a software product that is stored in a storage medium. Included are several instructions for causing a device (which can be a single-chip microcomputer, a chip, etc.) or a processor to execute all or part of the steps of the method described in the embodiments of the present application.
  • the foregoing storage medium includes various media that can store program codes, such as a U disk, a mobile hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Machine Translation (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

一种意图识别模型的学习方法、装置及设备,涉及通信技术领域,有利于提升人机对话系统中意图识别模型的准确性,提升人机对话系统执行任务的准确性,提升用户体验,该方法包括:服务器接收技能开发者输入的第一技能对应的正向数据(S101);服务器根据第一技能对应的正向数据,生成第一技能对应的负向数据(S102);服务器确定与第一技能相似的第二技能(S103);服务器获取各个第二技能对应的数据(S104);服务器根据第二技能对应的数据,以及服务器存储的第一基模型生成第二基模型(S105);服务器根据第一技能对应的正向数据和负向数据,以及第二基模型进行学习(S106),生成意图识别模型。

Description

一种意图识别模型的学习方法、装置及设备 技术领域
本申请涉及通信技术领域,尤其涉及一种意图识别模型的学习方法、装置及设备。
背景技术
人机对话系统,或称之为人机对话平台、聊天机器人(chatbot)等,是新一代的人机交互界面。具体的,人机对话系统按照涉及的领域,分为开放域(open-domain)的chatbot和面向具体任务(task-oriented)的chatbot。
其中,面向具体任务的chatbot可以实现为终端用户提供例如订餐、订票、打车等的服务的功能。例如:这些服务的提供商预先在服务器中输入一些功能A对应训练数据(例如:用户说法,也可称之为语料),服务器根据输入的训练数据,采用集成学习(Ensemble Learning)的方法,训练出功能A对应的模型。该功能A对应的模型可用于对终端用户输入的新的用户说法进行预测,以确定用户的意图,即服务器是否为该终端用户提供功能A对应的服务。
需要说明的是,在服务器的集成学习的过程中,服务器使用一些预先设定的基学习器(也可称之为基模型),对服务提供商输入的训练数据进行学习,并使用某种规则把各个基学习器的训练得到的模型进行整合,从而获得比单个基学习器更准确的模型。
由此可见,服务的提供商输入的训练数据的丰富性和准确性,以及预先设定的基模型的合理性,都直接影响到集成学习后得到的模型的准确度。如果输入的训练数据较少,或者输入的训练数据存在一些不准确,或者服务器预先设定的基模型出现某些不合理的情况时,都会严重影响到集成学习获得的模型的准确性,进而影响chatbot执行具体任务时的准确性,以及影响终端用户的体验。
发明内容
本申请提供的一种意图识别模型的学习方法、装置及设备,可以提升人机对话系统中意图识别模型的准确性,提升人机对话系统执行任务的准确性,提升用户体验。
第一方面,本申请实施例提供一种意图识别模型的学习方法,该方法包括:服务器接收技能开发者输入的第一技能中各个意图对应的正向数据;服务器根据第一技能中各个意图对应的正向数据生成第一技能中各个意图对应的负向数据;服务器获取第二技能对应的训练数据,第二技能为与第一技能相似的技能,第二技能的数量为至少一个;服务器根据第二技能对应的训练数据,以及预先设置的第一基模型进行学习,生成第二基模型,第二基模型的数量为至少一个;服务器根据第一技能中各个意图对应的正向数据和第一技能中各个意图对应的负向数据,以及第二基模型进行学习,生成意图识别模型。
由上可见,在本申请实施例中,由于在训练意图识别模型时,引入了负向数据, 有利于减少仅有正向数据训练带来的误识别的情况,有利于提升学习得到的意图识别模型的准确性。并且,由于在训练意图识别模型时,还引入了其他相似语音技能的训练数据,学习了更多的基模型,丰富了基模型的类型,能够提高训练得到的意图识别模型的准确性。
一种可能的实现方式中,服务器根据第一技能中各个意图对应的正向数据生成第一技能中各个意图对应的负向数据包括:服务器针对第一技能中各个意图,分别抽取出第一技能中各个意图对应的关键词,其中,关键词为影响第一基模型权重的关键特征;将第一技能中不同意图对应的关键词进行组合,或者,将不同意图对应的关键词与非第一技能相关的词进行组合,组合后的词确定为不同意图对应的负向数据。
需要说明的是,上述关键词为最容易影响分类(识别意图)特征权重的显著特征。在现有技术中,造成误识别的原因有,其他类别(其他意图)的显著特征和本类(本意图)的显著特征高度重合。例如:意图1为打开app,意图2为关闭app。其中,意图1和意图2的“app”为频率高的显著特征,但不是各个分类的显著特征,但分类时仍采用该频率高的显著特征,则很容易造成误识别。为此,本申请实施例通过降低这些各个分类(意图)都含有的显著特征在分类时权重,来提高准确率。
一种可能的实现方式中,服务器根据第一技能中各个意图对应的正向数据生成第一技能中各个意图对应的负向数据还包括:服务器根据第一技能的分类,从服务器上存储的训练数据的全集中确定出第一集合,第一集合中的训练数据包含与第一技能中各个意图对应的负向数据;其中,训练数据包括正向数据和负向数据;从第一集合中采样预设数量的训练数据;采用人工标注和/或聚类算法,从采样得到的训练数据中,确定出与第一技能中各个意图对应的负向数据。
一种可能的实现方式中,服务器获取第二技能对应的数据包括:服务器根据第一技能的分类,和/或第一技能的正向数据,确定第二技能;通过共享层获取第二技能对应的训练数据。
由于在训练意图识别模型时,还引入了其他相似语音技能的训练数据,有利于解决第一技能开发者输入较少训练数据,而难以学习出较为准确的意图识别模型的问题。
一种可能的实现方式中,服务器根据第二技能对应的训练数据,以及预先设置的第一基模型进行学习,生成第二基模型包括:服务器根据第二技能对应的训练数据,以及预先设置的第一基模型,采用多任务学习方法进行学习,生成第二基模型。
一种可能的实现方式中,服务器根据第一技能中各个意图对应的正向数据和第一技能中各个意图对应的负向数据,以及第二基模型进行学习,生成意图识别模型包括:服务器根据第一技能中各个意图对应的正向数据和第一技能中各个意图对应的负向数据,以及第二基模型,采用集成学习的方法进行学习,生成意图识别模型。
一种可能的实现方式中,服务器生成意图识别模型具体包括:服务器在生成意图识别模型的过程中,生成多个中间模型;服务器确定最先生成的中间模型为最快的意图识别模型;和/或,服务器对多个中间模型进行模型选择以及参数调整后,确定出最佳的意图识别模型。
本申请实施例提供了多种返回机制,例如:返回最快生成的意图识别模型,返回 最佳的意图识别模型等。例如:返回最快生成的意图识别模型时,由于服务器300在学习意图识别模型的过程中得到的多个中间模型,这些中间模型也能够实现意图识别模型的功能,故可以将最先得到的中间模型确定为返回给语音技能开发者或用户的意图识别模型,以便语音技能开发或用户能最快的获取到意图识别模型,以了解该意图识别模型的功能、性能等。又例如:返回最佳的意图识别模型时,服务器300在学习意图识别模型的过程中,可以对得到的多个中间模型的参数进行调整,以确定出准确性较高的模型为意图识别模型。能够满足用户的多种需求。
一种可能的实现方式中,服务器确定出最佳的意图识别模型包括:服务器根据第一技能中各个意图对应的正向数据计算出各个中间模型的第一准确率;服务器根据第一技能中各个意图对应的负向数据计算出各个中间模型的第二准确率;服务器根据第一准确率、第二准确率,以及技能开发者输入的权重,对多个中间模型进行模型选择以及参数调整后,确定出最佳的意图识别模型。
本申请实施例中,不仅考虑用该语音技能内的语料,例如:技能开发者的输入训练数据,测试出模型的准确性,还要考虑用服务器确定出一些该语音技能外的一些语料,例如自动生成的负向数据,测试出的模型的准确性。由于测试模型的准确性时,引入了更多的测试数据,测试结果更准确,那么根据更准确的测试结果来调整模型中的参数后,能够得到最佳的意图识别模型。也就是说,本申请实施例提供的技术方案有利于提高意图识别模型的准确性。
第二方面、本申请实施例提供一种服务器,包括:处理器,存储器以及通信接口;存储器用于存储计算机程序代码,计算机程序代码包括计算机指令,当处理器从存储器中读取计算机指令,以使得服务器执行如下操作:
通过通信接口,接收技能开发者输入的第一技能中各个意图对应的正向数据;根据第一技能中各个意图对应的正向数据生成第一技能中各个意图对应的负向数据;获取第二技能对应的训练数据,第二技能为与第一技能相似的技能,第二技能的数量为至少一个;根据第二技能对应的训练数据,以及预先设置的第一基模型进行学习,生成第二基模型,第二基模型的数量为至少一个;根据第一技能中各个意图对应的正向数据和第一技能中各个意图对应的负向数据,以及第二基模型进行学习,生成意图识别模型。
一种可能的实现方式中,服务器根据第一技能中各个意图对应的正向数据生成第一技能中各个意图对应的负向数据具体包括:服务器针对第一技能中各个意图,分别抽取出第一技能中各个意图对应的关键词,其中,关键词为影响第一基模型权重的关键特征;将第一技能中不同意图对应的关键词进行组合,或者,将不同意图对应的关键词与非第一技能相关的词进行组合,组合后的词确定为不同意图对应的负向数据。
一种可能的实现方式中,服务器根据第一技能中各个意图对应的正向数据生成第一技能中各个意图对应的负向数据还具体包括:服务器根据第一技能的分类,从服务器上存储的训练数据的全集中确定出第一集合,第一集合中的训练数据包含与第一技能中各个意图对应的负向数据;其中,训练数据包括正向数据和负向数据;从第一集合中采样预设数量的训练数据;采用人工标注和/或聚类算法,从采样得到的训练数据 中,确定出与第一技能中各个意图对应的负向数据。
一种可能的实现方式中,服务器获取第二技能对应的数据包括:服务器根据第一技能的分类,和/或第一技能的正向数据,确定第二技能;通过共享层获取第二技能对应的训练数据。
一种可能的实现方式中,服务器根据第二技能对应的训练数据,以及预先设置的第一基模型进行学习,生成第二基模型包括:服务器根据第二技能对应的训练数据,以及预先设置的第一基模型,采用多任务学习方法进行学习,生成第二基模型。
一种可能的实现方式中,服务器根据第一技能中各个意图对应的正向数据和第一技能中各个意图对应的负向数据,以及第二基模型进行学习,生成意图识别模型包括:服务器根据第一技能中各个意图对应的正向数据和第一技能中各个意图对应的负向数据,以及第二基模型,采用集成学习的方法进行学习,生成意图识别模型。
一种可能的实现方式中,服务器生成意图识别模型还具体包括:服务器在生成意图识别模型的过程中,生成多个中间模型;服务器确定最先生成的中间模型为最快的意图识别模型;和/或,服务器对多个中间模型进行模型选择以及参数调整后,确定出最佳的意图识别模型。
一种可能的实现方式中,服务器确定出最佳的意图识别模型包括:服务器根据第一技能中各个意图对应的正向数据计算出各个中间模型的第一准确率;服务器根据第一技能中各个意图对应的负向数据计算出各个中间模型的第二准确率;服务器根据第一准确率、第二准确率,以及技能开发者输入的权重,对多个中间模型进行模型选择以及参数调整后,确定出最佳的意图识别模型。
第三方面、一种计算机存储介质,包括计算机指令,当计算机指令在服务器上运行时,使得服务器执行如第一方面及其中任一种可能的实现方式中所述的方法。
第四方面、一种计算机程序产品,当计算机程序产品在计算机上运行时,使得计算机执行如第一方面中及其中任一种可能的实现方式中所述的方法。
附图说明
图1为本申请实施例提供的一种人机对话系统的结构示意图一;
图2为现有技术中意图识别模型的学习方法的示意图;
图3为本申请实施例提供的一种意图识别模型使用场景的示意图;
图4为本申请实施例提供的一种人机对话系统的结构示意图二;
图5为本申请实施例提供的一种意图识别模型的学习方法的示意图一;
图6为本申请实施例提供的一种意图识别模型的学习方法的示意图二;
图7为本申请实施例提供的一种意图识别模型的学习方法的示意图三;
图8为本申请实施例提供的一种服务器的结构示意图;
图9为本申请实施例提供的一种意图识别模型的学习方法的流程示意图;
图10为本申请实施例提供的一种意图识别模型的学习方法的示意图四。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。其中,在本申请实施例的描述中,除非另有说明,“/”表示或的意思,例如,A/B可以表 示A或B;本文中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。
以下,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本申请实施例的描述中,除非另有说明,“多个”的含义是两个或两个以上。
如图1所示,为本申请实施例提供的一种人机对话系统的组成示意图。该人机对话系统包括:一个或多个电子设备100、一个或多个服务器200以及一个或多个服务器300。电子设备100和服务器300之间建立通信连接。服务器300分别与电子设备100和服务器200建立通信连接。可选的,电子设备100还可以与服务器200之间建立通信连接。其中的通信连接可以采用电信网络(3G/4G/5G等通信网络)或者WIFI网络等建立连接,本申请实施例对此不做限定。
其中,电子设备100电子设备100可以为手机、平板电脑、个人计算机(Personal Computer,PC)、个人数字助理(personal digital assistant,PDA)、智能手表、上网本、可穿戴电子设备、增强现实技术(Augmented Reality,AR)设备、虚拟现实(Virtual Reality,VR)设备、车载设备、智能汽车、智能音响、机器人等,本申请对该电子设备的具体形式不做特殊限制。
服务器200可以是第三方应用的服务器,用于提供第三方应用的服务。其中,第三方应用例如可以是美团、亚马逊、滴滴打车等应用。
服务器300可以是电子设备100的厂商的服务器,例如可以是电子设备100中语音助手的云服务器等,服务器300还可以是其他服务器,本申请实施例不做限定。
以第三方应用开发一项新的语音应用或功能(可称为语音技能)为例,对本申请实施例提供的技术方案的应用的人机对话场景进行说明。
其中,语音技能可以是指用户可以通过电子设备100,与第三方应用的服务器200之间采用对话式交互的方式,请求第三方应用提供该应用中一个或多个服务的功能。这个对话式交互过程是模拟用户实际生活中的交互场景,使得用户与电子设备的交互,就像与人交互一样自然。
在上述对话式交互过程中,用户说的每句话都对应着一个意图,是用户说这句话的目的。需要说明的是,每个语音技能都是由数个意图组成,服务器300通过将用户说的每句话来和语音技能中的意图进行匹配,来了解用户的需求,并提供相应的服务,例如提供订餐、订票、打车等服务。
上述根据用户说的话和语音技能中的意图进行匹配的过程,即为意图识别的过程,服务器300可以构建意图识别的模型,该意图识别模型可具体根据用户输入的用户说法,自动识别出该用户说法对应的意图。
如图2所示,为服务器300训练意图识别模型的过程示意图。具体的,语音技能的开发者,例如服务器200,需要先向服务器300输入一些新的语音技能对应的训练数据(可以包含用户说法与用户意图的对应关系)。服务器300可以根据这些训练数 据,采用相应的学习方法,例如集成学习方法,得到该新的语音技能对应的意图识别模型。例如:将语音技能1对应的的语料A,输入到模型学习框架1中进行学习,得到语音技能1对应的意图识别模型。其中,模型学习框架1例如可以采用集成学习的框架,模型学习框架中最小的学习单元为基模型,在集成学习框架其中包含有多个基模型,记为基模型1。这些基模型1之间是具有差异性的,例如:不同的基模型1可以是属于不同模型的,例如可以是支持向量机(support vector machines,SVM)和逻辑回归(logistic regression,LR)两种模型;也可以属于同一个模型,但使用不同的超参数(例如:使用不同超参数C值的SVM支持向量机模型);还可以是属于同一个模型但使用不同数据的(例如同一个SVM模型使用不同的原始数据子集等)。
另外,关于本申请实施例中涉及到的集成学习方法的具体技术方案,可以参见中国国家知识产权局公告的申请号为CN 102521599A,名称为“一种基于集成学习的模式训练和识别方法”的专利申请。还可以参见中国国家知识产权局公告的申请号为CN107491531A,名称为“基于集成学习框架的中文网络评论情感分类方法”的专利申请,以及现有技术中集成学习方法的其他实现方式,本申请实施例不做限定。
在新的语音技能对应的意图识别模型训练完成之后,当电子设备100接收到用户输入的说法后,将用户输入的说法发送给服务器300,由该服务器300根据用户的说法,以及训练得到的意图识别模型进行预测,以确定出该用户的说法对应的用户的意图,进而确定出该用户提供服务的第三方应用以及第三方应用中具体的功能。
例如:如图3所示,为服务器300确定为用户提供服务第三方应用的过程示意图。假设服务器300中除了新学习出的语音技能1对应的意图识别模型(属于第三方应用1其中一个语音技能)外,还存储有第三方应用1的语音技能2对应的意图识别模型,第三方应用2的语音技能3对应的意图识别模型等。
具体的,用户通过电子设备100输入语音1,电子设备100可以将语音1转化成相应的文本后,即用户说法1,发送给服务器300,或者电子设备100可以直接将语音1发送给服务器300,由服务器300将语音1转化成文本(用户说法1)。服务器300将用户书法1分发到各个语音技能对应的意图识别模型中,由各语音技能对应的意图识别模型进行运算,确定用户说法1对应与语音技能1,也就是说,需要为用户提供第三方应用的语音技能1对应的服务。
在一些实施例中,服务器300在确定出该用户提供服务的第三方应用后,可以将确定的第三方应用的信息返回给电子设备100,第三方应用的信息可以包括第三方应用的服务器的地址信息,第三方应用的信息还可以包括需要为用户提供的具体的服务的信息,例如:订餐或查询订单等。而后,电子设备100可以与第三方应用的服务器200建立通信连接,第三方应用的服务器200为该用户提供相应的服务。在另一些实施例中,服务器200可以与相应的第三方应用的服务器200建立通信连接,并将该用户的服务请求信息发送给第三方应用的服务器200,第三方应用的服务器200可以通过服务器300为电子设备100进行交互,为该用户提供相应的服务。其中,服务请求信息可以包括:用户请求的具体的服务的信息,例如:订餐或查询订单等。本申请实施例对第三方应用的服务器200为电子设备100的用户提供服务的具体方式不做限定。
需要说明的是,如图4所示,为本申请实施例提供的另一种人机对话系统的示意图。该人机对话系统包括:一个或多个电子设备100和一个或多个服务器200。其中,电子设备100和服务器200可参考图1中相关描述。
不同之处在于,上述的根据语音技能开发者输入新的语音技能对应的训练数据,学习出新的语音技能对应的意图识别模型的过程,也可以在服务器200上执行,或者电子设备100的运算功能能够支持该过程的运算时,也可以在电子设备100上执行,本申请实施例提供的方法对训练意图识别模型的执行主体不做限定。若由第三方应用的服务器200来执行训练意图识别的模型,那么,第三方应用的服务器200训练的是第三方应用自身的多个语音技能的意图识别模型。下文以服务器300执行训练意图识别模型为例进行说明的。
需要注意的是,服务器300在训练意图识别模型的过程中,当语音技能开发者不能提供足够多的训练数据,或者,语音技能开发者不能保证训练数据的准确性时,服务器300训练的目标模型的准确性无法保证,进而会影响服务器300识别用户意图的准确性,影响电子设备的用户体验。又由于语音技能的场景繁多,类型繁多,服务器300预先设置的基模型不可避免的会存在一些不合理的情况,例如:不适用某些场景或者类型的语音技能,这也会影响到服务器300训练目标模型的准确性。
需要说明的是,服务器300可以为多个第三方应用提供一个共用的人机对话平台,该人机对话平台的开发者或维护者,可以在该人机对话平台上预先设置一些基本的技能模板(包括一些预先设置的训练数据、学习时使用的基模型等),这些技能模板涵盖了部分的常用的使用场景。而后,第三方应用可以对这些基本的技能模型进行修改以实现自己的个性化需求,或者第三方应用根据自己的服务内容添加自定义的技能。
为此,本申请实施例提供的技术方案可运用于服务器300学习新的语音技能对应的意图识别模型的过程中,能够在现有技术的基础上,提高服务器300学习到的意图识别模型的准确性。
如图5所示,为本申请实施例提供的一种学习意图识别模型的方法的示意图。服务器300可以根据输入的新的语音技能对应的训练数据,自动生成一些其他的训练数据。然后,语音技能开发者输入的训练数据和自动生成的其他的训练数据,一同输入到服务器300中,由服务器300进行学习。由于在学习的过程中,引入了更多训练数据,能够提高服务器300学习后得到的意图识别模型的准确性。
例如:输入的语音技能1的训练数据有语料A,服务器根据语料A中的正向数据,生成负向数据,即语料B,再将语料A和语料B一同输入到模型学习框架中进行学习,得到语音技能1对应的意图识别模型。其中,正向数据为能够触发服务器300执行相应的操作的用户说法的数据。负向数据为与正向数据相似,但不是正向数据的一些数据,也就是服务器300不执行相应操作的用户说法的数据。
在本申请的另一些实施例中,服务器300可以学习到同一个第三方应用的不同的语音技能对应的意图识别模型,还可以学习到不同第三方应用的不同的语音技能对应的意图识别模型。也就是说,服务器300可以存储有同一个第三方应用的不同语音技能的意图识别模型,以及不同第三方应用的不同的语音技能对应的意图识别模型。考 虑到服务器300上可以存储各种主题各种类型的语音技能对应的意图识别模型,这些大量的语音技能之间会存在一些相似性,那么这些语音技能对应的语料也可以进行复用。因此,本申请实施例提供的技术方案还可以在训练某个语音技能对应的意图识别模型时,结合与该语音技能相似的其他语音技能的训练数据进行训练。例如:服务器300可以根据与该语音技能相似的其他语音技能的训练数据,对模型学习框架中的基模型进行扩充。其中,与该语音技能相似的其他语音技能可以是与该语音技能属于同一个第三方应用的,也可以是与该语音技能属于不同的第三方应用的,本申请实施例对相似的语音技能的类型和数量均不做限定。
例如:如图6所示,为本申请实施例提供的又一种学习意图识别模型的过程示意图。以服务器300确定了语音技能2和语音技能3为与语音技能1相似的其他语音技能为例进行说明。
服务器300可以将语音技能2对应的语料C输入到模型学习框架2中学习得到一个或多个基模型2,服务器300将语音技能3对应的语料D输入到模型学习框架2中,学习得到一个或多个基模型2。其中,模型学习框架2中包含服务器300预先存储的一个或多个基模型1。根据语料C和语料D,以及模型学习框架2得到的这多个基模型2,即为服务器300扩充后得到的基模型,可以作为模型学习框架3中的基模型。其中,模型学习框架2例如可以采用多任务学习的框架,模型学习框架3例如可以采用集成学习的框架。
服务器300可以将技能开发者输入的语音技能1对应的语料A,和自动生成的语料B一同输入到模型学习框架3中进行学习,得到语音技能1对应的意图识别模型。可选的,服务器300也可以直接将技能开发者输入的语音技能1对应的语料A输入到模型学习框架3中进行学习,得到语音技能1对应的意图识别模型。本申请实施例对此不做限定。
由此可见,由于在训练意图识别模型时,还引入了其他相似语音技能的训练数据,学习了更多的基模型,丰富了基模型的类型,能够提高训练得到的意图识别模型的准确性。
在本申请的又一些实施例中,如图7所示,为本申请实施例提供的又一种服务器300学习意图识模型的过程示意图。具体的,在服务器300学习意图识别模型的过程中,会陆续得到多个中间模型,服务器300会对这些中间模型进行模型选择以及参数调整,以获得最后生成的准确率最高的意图识别模型。需要说明的是,这些中间模型都可以实现语音技能1对应的意图识别模型的功能。考虑到语音技能开发者不同的需求,本申请实施例提供了多种返回机制,例如:返回最快生成的意图识别模型,返回最佳的意图识别模型等。例如:返回最快生成的意图识别模型时,由于服务器300在学习意图识别模型的过程中得到的多个中间模型,这些中间模型也能够实现意图识别模型的功能,故可以将最先得到的中间模型确定为返回给语音技能开发者或用户的意图识别模型,以便语音技能开发或用户能最快的获取到意图识别模型,以了解该意图识别模型的功能、性能等。又例如:返回最佳的意图识别模型时,服务器300在学习意图识别模型的过程中,可以对得到的多个中间模型的参数进行调整,以确定出准确 性较高的模型为意图识别模型。
又由于本申请实施例提供的技术方案,在训练某个语音技能对应的意图识别模型时,结合与该语音技能相似的其他语音技能的训练数据,本申请实施例还提供了一种确定出最佳模型的标准,即不仅考虑用该语音技能内的语料,例如:技能开发者的输入训练数据,测试出模型的准确性,还要考虑用服务器确定出一些该语音技能外的一些语料,例如自动生成的负向数据,测试出的模型的准确性。由于测试模型的准确性时,引入了更多的测试数据,测试结果更准确,那么根据更准确的测试结果来调整模型中的参数后,能够得到最佳的意图识别模型。也就是说,本申请实施例提供的技术方案有利于提高意图识别模型的准确性。具体的实现过程将在下文中详细介绍。
如图8所示,为本申请实施例提供的一种服务器300的硬件结构示意图,服务器300包括至少一个处理器301、至少一个存储器302、至少一个通信接口303。可选的,服务器300还可以包括输出设备和输入设备,图中未示出。
处理器301、存储器302和通信接口303通过总线相连接。处理器301可以是一个通用中央处理器(Central Processing Unit,CPU)、微处理器、特定应用集成电路(Application-Specific Integrated Circuit,ASIC),或者一个或多个用于控制本申请方案程序执行的集成电路。处理器301也可以包括多个CPU,并且处理器301可以是一个单核(single-CPU)处理器或多核(multi-CPU)处理器。这里的处理器可以指一个或多个设备、电路或用于处理数据(例如计算机程序指令)的处理核。
在本申请实施例中,处理器301可具体用于根据语音技能开发者输入的新的语音技能的训练数据,自动生成新的语音技能的负向数据。处理器301还可具体用于确定与新的语音技能相似的其他语音技能,获取其他语音技能的数据,对模型学习框架中的基模型进行扩充。处理器301还可以根据扩充后的基模型和开发者输入的训练数据,以及处理器301自动生成的负向数据,进行学习,得到意图识别模型。处理器301还可以具体用于对学习过程中产生的多个中间模型进行调参,以便生成最佳的意图识别模型等。
存储器302可以是只读存储器(Read-Only Memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备、随机存取存储器(Random Access Memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)、只读光盘(Compact Disc Read-Only Memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器302可以是独立存在,通过总线与处理器301相连接。存储器302也可以和处理器301集成在一起。其中,存储器302用于存储执行本申请方案的应用程序代码,并由处理器301来控制执行。处理器301用于执行存储器302中存储的计算机程序代码,从而实现本申请实施例中所述数据传输的方法。
在本申请实施例中,存储器302可用于存储服务器300中学习框架中预先设置的 基模型的数据,还可用于存储服务器300自动生成的各个语音技能对应的负向数据,处理器301在学习过程中生成的各个中间模型,以及训练或测试用的各种数据,以及学习到的各个语音技能对应的意图识别的模型等。
通信接口303,可用于与其他设备或通信网络通信,如以太网,无线局域网(wireless local area networks,WLAN)等。
在本申请实施例中,通信接口303可具体用于与电子设备100进行通信,实现与电子设备的用户之间的交互。通信接口303还可以具体用于与第三方应用的服务器200进行通信,例如:服务器300可以接收第三方应用的服务200输入的新的语音技能对应的训练数据,或者,服务器300可以将确定的用户服务请求发送给第三方应用的服务器200,以便第三方应用的服务器200为电子设备的用户提供相应的服务等。
输出设备和处理器通信,可以以多种方式来显示信息。例如,输出设备可以是液晶显示器(Liquid Crystal Display,LCD),发光二级管(Light Emitting Diode,LED)显示设备,阴极射线管(Cathode Ray Tube,CRT)显示设备,或投影仪(projector)等。输入设备和处理器通信,可以以多种方式接收用户的输入。例如,输入设备可以是鼠标、键盘、触摸屏设备或传感设备等。
以下实施例中所涉及的技术方案均可以在具有上述硬件架构的服务器300中实现。
如图9所示,为本申请实施例提供一种意图识别模型的学习方法的流程示意图,具体包括:
S101、服务器接收技能开发者输入的第一技能对应的正向数据。
其中,技能开发者可以为第三方应用的开发者,也可以为电子设备的生产商,本申请实施例不做限定。
在本申请的一些实施例中,服务器为技能开发者提供一个技能开发的平台,技能开发者可以通过电信网络(3G/4G/5G等通信网络)或者WIFI网络等方式登录到该技能开发者的平台上,注册账户,并开始创建一项新的语音技能,即第一技能。技能开发者可以输入第一技能的名称、第一技能的分类等基本信息,然后输入第一技能包含的意图、以及各个意图对应的正向数据。其中,各个意图对应的正向数据包括:服务器响应于接收到正向数据后需要执行相应的操作,例如:触发相应的服务,或者向电子设备返回相应的对话等。也就是说,能够触发服务器执行相应的操作的用户说法的数据,称之为正向数据,还可以称之为正向语料、正例、正例数据,技能内数据等。
在本申请的另一些实施例中,技能开发者还可以在输入第一技能的各个意图对应的正向数据时,也可以输入一些各个意图对应的负向数据。其中,负向数据为与正向数据相似,但不是正向数据的这类数据。其中,负向数据还可以称之为负向语料、负例、负例数据,技能外数据等,本申请实施例不做限定。
例如:技能开发者为美团应用,美团应用开发的第一技能为订餐服务。美团应用可以输入订餐服务对应的正向数据例如有:订外卖,订餐等。美团应用也可以输入负向数据,例如:订机票等。
S102、服务器根据第一技能对应的正向数据,生成第一技能对应的负向数据。
其中,第一技能包括多个意图,每个意图可以对应于一个或多个正向数据。针对 每个意图,服务器可以根据每个意图的正向数据生成相应的负向数据。这样,有利于提高后面训练出的意图识别模型的准确性。
这是因为,在现有技术中,输入服务器的训练数据只有一类(正向数据)时,服务器训练出来的意图识别模型在判别某个用户说法时,判别标准只有技能开发者设置的阈值,也就是说,某个说法的置信度低于该阈值时,就认为该说法不是第一技能内的意图。某个说法的置信度达到该阈值时,认为该说法时第一技能内的意图。实际场景中,由于不同语音技能的阈值取决于不同语音技能之间的相似度等因素,这个阈值很难设定。在本申请实施例中,引入其他类型的训练数据,例如:引入的负向数据,还可以进行类别判别,有利于避免一些误识别的情况,有利于提高训练得到的意图识别模型的准确性。
示例性的,本申请实施例提供了两种根据第一技能对应的正向数据生成第一技能对应的负向数据的方法,如下:
方法一、针对第一技能中的每个意图,服务器从与各个意图对应的正向数据中分别抽取出各个意图的关键词,将抽取出的关键词与其他无关词进行组合,构成各个意图对应的负向数据。
需要说明的是,上述关键词为最容易影响分类(识别意图)特征权重的显著特征。在现有技术中,造成误识别的原因有,其他类别(其他意图)的显著特征和本类(本意图)的显著特征高度重合。例如:意图1为打开app,意图2为关闭app。其中,意图1和意图2的“app”为频率高的显著特征,但不是各个分类的显著特征,但分类时仍采用该频率高的显著特征,则很容易造成误识别。为此,本申请实施例通过降低这些各个分类(意图)都含有的显著特征在分类时权重,来提高准确率。
其中,上述的抽取正向数据中的关键词可以是技能开发者手动进行标注,也可以是采用采样关键词的相关挖掘技术进行抽取。示例的,正向数据中的关键词也是正向数据中的一个特征,故可以采用特征选择方法进行抽取,例如基于L1正则化(regularization)的LR(logistic regression)方法。本申请实施例还可以采用包裹式(wrapper)特征选择方法,例如:LVW(Las Vegas Wrapper)、基于字典学习的稀疏表示方法,及简单的启发式算法(Heuristic Algorithm)等。本申请实施例对抽取关键词的具体方法不做限定。
示例性的,以基于L1正则化的LR方法为例,对抽取正向数据的关键词的过程进行简单介绍。假设正向数据对应的意图一共有K个分类(即用户输入的正向数据对应于K个意图,即一个意图可以理解为一个分类),分别构造K个“1vs the rest”的分类器,通过L1范数可以使得对于分类权重影响小的特征被忽略,从而保留对于分类有影响力的特征。鉴于针对每个正向数据,都分别转化为了2个分类,同时特征为0/1离散特征,则可通过权重符号以及绝对值大小,筛选排序获得显著特征,即是针对每个意图来说权重最大的关键词。
其中,上述无关词可以是和抽取的关键词组合后,形成与原正向数据形式上相似的,但语义背离的词(或词语组合或句子等),或者与原正向数据语义不相关的词(或词语组合或句子等)。例如可以采用句子性相似度的方法,判断组合的词(或词语组 合或句子等)与原正向数据语义是否背离或者不相关。
示例性的,服务器可以将不同意图对应的关键词进行随机组合,生成负向数据。服务器还可以将每个意图中的关键词与第一技能不相关的一些例如实体词等进行随机组合,生成负向数据。本申请实施例不限定服务器对无关词的选择方法,以及将每个意图的关键词与其他无关词进行组合的方式均不做限定。
例如:某个技能的正向数据包含有“打开菜单”和“关闭菜单”,分别对应于两个意图,意图一和意图二。抽取正向数据的关键词后,可以得到意图一的关键词为“打开”,意图二的关键词为“关闭”。那么,将不同的意图的关键词进行组合得到的“打开关闭”,将每个意图的关键词与其他无关词进行组合得到的“打开抽屉”、“关闭微信”等就构成了该技能的负向数据。
方法二、在本申请实施例中,服务器存储大量的语音技能的训练数据,包含有同一个第三方应用的不同语音技能的数据,不同第三方应用的不同语音技能的数据。这些海量的训练数据可以是属于不同领域的,不同分类的,构成了服务器上的一个训练数据的全集。针对第一技能,服务器可以根据第一技能的分类,从该服务器上存储的训练数据的全集中确定出第一集合,其中第一集合中的语料为可能与第一技能的正向数据相冲突的语料的集合。在确定第一集合时,可以采用基于类内分类器的方式、或者基于相似性检索的方式,例如:倒排索引、句子相似度计算等,确定出与正向数据相冲突的语料,或者确定出可疑语料。
需要说明的是,服务器在确定第一集合时,可以直接从训练数据的全集中确定出第一集合,也可以从训练数据的全集中逐步确定出第一集合,也就是说,服务器可以先从训练数据的全集中确定出一个集合A,其中,集合A小于训练数据的全集。再集合A中确定一个集合B,集合B小于集合A,依次类推,直到确定出第一集合为止。本申请实施例对服务器确定第一集合的具体方法不做限定。
其中,第一集合中的语料可以是与正向数据相似的语料,例如包含有与正向数据相同或相似的关键词,或者包含有与正向数据相同或相似的实体词(例如:名词)。例如:正向数据为“打开菜单”,假设该语料的关键词为“打开”。那么,第一集合中可以包含有例如:“打开收音机”、“开启菜单”、“隐藏菜单”等。
服务器可以从第一集合中采样出一定数量的语料,并进一步确定采用的语料是否为第一技能对应的负向数据。
其中,上述的一定数量(即采样的语料的数量)可以根据正向数据的数量进行确定。例如:采样的语料的数量可以确定为与正向数据的数量相同的数量,也可以是每个意图中对应的正向数据的最大语料数量,或者平均语料数量,或者为预设的数量等,本申请实施例对采样语料的数量不做具体限定。
其中,采样过程中可以采用随机采样的方式或者按照主题分布进行采样等方式,本申请实施例对采样的具体方式也不做限定。
服务器在确定采样的语料是否为第一技能的负向数据,也可以理解为将采样的语料进行分类,若与正向数据的语义相同,则为正向数据,若不同,则为负向数据。确定采样语料是否为第一技能的负向数据,可以是采用人工进行定义的方式,也可以是 采用聚类算法与人工进行迭代的方式确定。其中聚类算法可以是层次聚类(hierarchical clustering)或者基于主题模型(Topic Model)的文本聚类分类方法等。其中,聚类算法和人工进行迭代的方式例如可以是:对一些聚类算法无法识别的特征进行人工标注,例如:颜色的鲜艳程度等,对于一些聚类算法无法确定的可疑语料,由人工确认是否是负向数据。聚类算法还可以学习人工标注的规则,再利用该规则对其他采样的语料确定是否为负向数据等。
需要说明的是,服务器在根据正向数据生成负向数据时,可以采用上述方法中的任一种,或者使用两种方法都使用,或者使用其他的方法,本申请实施例不限定根据正向数据生成负向数据的方法。
S103、服务器确定与第一技能相似的第二技能,第二技能为至少一个。
需要说明的是,考虑到服务器上存储有大量的不同语音技能的数据,而这些大量的不同语音技能之间的数据存储相似性,服务器可以将不同语音技能之间的数据进行共享。具体的,服务器可以在征得各个语音技能开发者的同意后,把各个语音技能的数据共享出来。共享的方式例如可以是基于分类进行共享,例如:同一分类中的不同语音技能之间可以共享数据,不同分类中的不同语音技能之间不可以共享数据。在一些实施例中,服务器可以在可以共享数据的语音技能之间添加共享层,不同的语音技能可以通过共享层获取其他语音技能的数据。在另一些实施例中,服务器还可以内置一些数据,该数据可以被不同的语音技能使用,这样,在服务器学习新的语音技能对应的意图识别模型时,若语音技能开发者没有输入该语音技能对应的训练数据,或者输入的训练数据较少的情况下,服务器也可以通过共享其他语音技能的数据,以及服务器内置的数据,进行学习,以得到意图识别模型。
为此,在步骤S103和S104中,服务器可以根据例如第一技能的分类,或者第一技能的训练数据与其他技能训练数据的相似性等,确定服务器上其他与第一技能相似的第二技能,第二技能可以是第一技能的开发者开发的其他语音技能,也可以是其他语音技能开发者开发的其他语音技能,本申请实施例对第二技能的抽取方法以及抽取数量均不做限定。
S104、服务器获取各个第二技能对应的数据。
服务器例如可以通过共享层,获取各个第二技能对应的训练数据,可以包括正向数据和负向数据,具体的可以包括第二技能的开发者输入的正向数据和负向数据,也可以包括服务器在学习第二技能对应的意图识别模型或者其他模型的过程中,自动生成的负向数据等。
S105、服务器根据第二技能对应的数据,以及服务器存储的第一基模型生成第二基模型。
服务器学习各个技能对应的意图识别模型时,会预先设置的模型学习框架,包含使用的一个或多个第一基模型,以及各个第一基模型之间的组织关系(可以包括各个第一基模型之间的权重、层次关系)等。
在本申请实施例中,服务器将各个第二技能对应的语料,分别输入到第一基模型中,学习成多个第二基模型。在一些实施例中,例如可以采用多任务学习方法等,其 中多任务学习方法可以基于深度学习卷积神经网络(Convolutional Neural Networks,CNN)、双向长短期记忆网络(Long Short-Term Memory,LSTM)等模型实现。在另一些实施例中,还可以在多任务学习的过程中,加入对抗训练,可以进一步实现对共享数据的提纯,提升第二基模型的准确性,进而提升意图识别模型的准确性。
例如:如图6所示,以第一技能为语音技能1,输入的第一技能的训练数据为语料A为例,进行说明。假设服务器确定语音技能1相似的语音技能有语音技能2和语音技能3。其中,语音技能2对应的训练数据为语料C,语音技能3对应的训练数据为语料D。
服务器将语料C输入到模型学习框架2中进行学习,其中,模型学习框架2中包括一个或多个第一基模型,即基模型1。模型学习框架2例如可以是多任务学习框架,具体的,将语料C输入多任务学习框架中生成多个第二基模型,即多个基模型2。此时,服务器相当于面向语音技能2的任务学习。将语料D输入多任务学习框架中生成多个第二基模型。此时,服务器相当于面向语音技能3的任务学习。
也就是说,服务器在学习语音技能1对应的意图识别模型时,还引入了其他语音技能(语音技能2或语音技能3)的训练数据,并且对其他语音技能的训练数据进行多任务联合训练,生成了不同表示空间的第二基模型。再将不同表示空间的第二基模型输入集成学习框架中进行学习,最后得到意图识别模型。相比较于现有技术中,单纯的对训练数据进行特征变换或者替换第一基模型的方法,本申请实施例提供的技术方案,更有利于提高学习到的语音技能对应的意图识别模型的准确性。
S106、服务器根据第一技能对应的正向数据和负向数据,以及第二基模型进行学习。若根据策略确定返回最快的意图识别模型,则执行步骤S107,若根据策略确定返回最佳的意图识别模型,则执行步骤S108。
上述策略可以是服务器预先设置的策略,即返回哪一个模型或者返回两种模型;也可以是服务器向用户推荐的策略,即可以根据不同情况向用户推荐返回不同的模型。还可以是服务器接收用户输入的返回策略,确定返回相应的模型,本申请实施例对策略不做限定。
在本申请的一些实施例中,服务器可以采用集成学习方法进行学习,例如:stacking集成学习方法。在服务器学习的过程中,会陆续生成一些中间模型。
例如:如图10所示,为本申请实施例提供的又一种学习意图识别模型的过程示意图。当模型学习框架3采用stacking集成学习框架时,服务器将第一技能对应的正向数据和负向数据输入到stacking集成学习框架中,学习到第一层的中间模型,记为中间模型1。然后,服务器将第一层的中间模型得到的特征输入到stacking集成学习框架中,学习到第二层的中间模型,依次类推,直到最后生成的意图识别模型。
S107、服务器确定最快的意图识别模型。
在步骤S106的学习过程中,服务器会陆续生成多个中间模型,这些中间模型都可以独立实现对第一技能对应的意图识别模型的功能。因此,基于技能开发者的需求,服务器可以先返回一些中间模型作为最快生成的模型。一些实施例中,服务器可以确定并返回最先生成的中间模型为最快的意图识别模型。如图10中所示,服务可以确定 最先生成的第一中间模型为最快的意图识别模型。另一些实施例中,服务器可以确定并返回采用指定方法生成的模型,为最快的意图识别模型。这样,技能开发者可以能最快的获取到意图识别模型,以了解该意图识别模型的功能、性能等。
S108、服务器确定最佳的意图识别模型。
一般认为,服务器根据第一技能对应的正向数据和负向数据,以及第二基模型进行学习完成后,最后得到的意图识别模型即为最佳的意图识别模型。这是由于在服务器在学习的过程中,会根据陆续生成的这多个中间模型进行模型的选择和参数的调整,以使最后得到的意图识别模型为准确率最高。例如可以采用网络搜索(grid search)方法,或者贝叶斯优化(Bayesian optimization)方法等进行参数调优。
需要注意的是,现有技术中的参数调优,是基于第一技能开发者输入的第一技能对应的训练数据进行调优的,故最后得到的意图识别模型的准确率,也是仅反映用第一技能内的数据进行预测时的准确率。
然而,在本申请实施例中,服务器在训练意图识别模型时,除了使用第一技能内的数据(例如第一技能开发者输入的第一技能的正向数据)外,还使用了第一技能外的数据(例如:自动生成的第一技能的负向数据,以及在基模型扩充时,还使用了与第一技能相似的第二技能的数据等)。
因此,如图10所示,服务器在选择最佳的意图识别模型时,需要考虑第一技能内的数据(例如:第一技能的正向数据),以及第一技能外的数据(例如:服务自动生成的第一技能的负向数据)。服务器需要考虑的准确率有两部分。一是用第一技能内的数据预测的准确率,即第一技能数据的准确率。用第一技能内的数据预测时,预测为positive类(即能够识别为调用第一技能的意图)则预测正确。二是用第一技能外的数据预测的准确率,即第一技能外数据的准确率。用第一技能外的数据预测时,预测为negative类(即能够识别为不能调用第一技能的意图)则预测正确。
在一些实施例中,也可以用第一技能外数据的误召回率来反映用第一技能外的数据预测的准确率。第一技能外数据的误召回率越低,用第一技能外的数据预测的准确率就越高。可见,第一技能数据的准确率与第一技能外数据的准确率成反比,与第一技能外数据的误召回率成正比。也就是说,第一技能数据准确率高,与第一技能的误召回率也高。那么,无法使得最后训练得到的意图识别模型在这两个方面都达到最优,因此,本申请实施例提供了一种评价最佳的意图识别模型的方法。
服务器可以设置一个第一技能的置信度,该置信度可以由用户(例如:第一技能的开发者)进行设置。该置信度的参数可以反映用户对第一技能内数据准确率和第一技能外数据的误召回率这两个指标的相对要求。例如:置信度越大,表示用户希望第一技能外数据的误召回率越低,即第一技能外的数据预测准确性越高。因此,可以用下面的公式1来评价最佳的意图识别模型,如下:
score=accuracyIn*(1-C)+accuracyOut*C(公式1)
其中,score为评价最佳的意图识别模型的分值,分值越高,表明意图识别模型的准确率也高,越符合用户的要求。accuracyIn为第一技能内数据的平均准确率。
accuracyOut为根据第一技能外的数据的平均准确率。C为用户设置的置信度的数值, C为大于等于零,且小于等于1。
服务器在计算第一技能内数据的平均准确率和第一技能外数据的平均准确率时,例如可以采用K折交叉验证的方法。其中,K折交叉验证的具体方法可以参考现有技术,这里不再赘述。
在本申请的另一些实施例中,为了避免用户对置信度的设置不合理时,会对训练得到意图识别模型的准确率造成较大的影响。例如:用户设置的置信度过大时,会忽略了技能内数据的准确率。为此,服务器还可以设置一个参数,对用户设置的置信度进行控制。那么,可以采用下面的公式2来评价最佳的意图识别模型,如下:
score=accuracyIn*(1-C*P)+accuracyOut*C*P(公式2)
其中,score、accuracyIn、accuracyOut和C参数含义与公式1中相同,不再赘述。P为服务器设置的参数,P为大于等于零,且小于等于1。P值可以控制用户设置的置信度对score的影响程度的大小。P值越大,表明允许用户设置的置信度对score的影响也越大。P值越小,表明允许用户设置的置信度对score的影响也越小。
需要说明的是,服务器在执行步骤S107和S108的过程中,可以通过共享层,实现对中间模型的一些共同特征进行共享,以减少计算量,提升模型训练的效率。
而后,学习到最快或最佳的意图识别模型后,可用于对电子设备的用户输入的新的用户说法,进行预测,确定出该用户的意图,以便为该用户提供相应的服务。
例如:如图1所示的人机对话系统,可以将学习到的最快或最佳的意图识别模型部署在服务器300上,也可以部署在服务器200上。部署在服务器300时,服务器300可存储多个第三方应用(例如:第三方应用1和第三方应用2)的不同语音技能对应的意图识别模型。部署在服务器200时,则服务器200可存储该第三方应用的不同语音技能对应的意图识别模型。
又例如:如图4所示的人机对话系统,可以将学习到的最快或最佳的意图识别模型部署在服务器200上,服务器200可存储第三方应用1的不同语音技能对应的意图识别模型。
意图识别模型的应用可参考前文的描述,不再赘述。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个装置,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是一个物理单元或多个物理单元,即可以位于一个地方,或者也可以分 布到多个不同地方。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该软件产品存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (16)

  1. 一种意图识别模型的学习方法,其特征在于,所述方法包括:
    服务器接收技能开发者输入的第一技能中各个意图对应的正向数据;
    所述服务器根据所述第一技能中各个意图对应的正向数据生成所述第一技能中各个意图对应的负向数据;
    所述服务器获取第二技能对应的训练数据,所述第二技能为与所述第一技能相似的技能,所述第二技能的数量为至少一个;
    所述服务器根据所述第二技能对应的训练数据,以及预先设置的第一基模型进行学习,生成第二基模型,所述第二基模型的数量为至少一个;
    所述服务器根据所述第一技能中各个意图对应的正向数据和所述第一技能中各个意图对应的负向数据,以及所述第二基模型进行学习,生成意图识别模型。
  2. 根据权利要求1所述的意图识别模型的学习方法,其特征在于,所述服务器根据所述第一技能中各个意图对应的正向数据生成所述第一技能中各个意图对应的负向数据包括:
    所述服务器针对所述第一技能中各个意图,分别抽取出所述第一技能中各个意图对应的关键词,其中,所述关键词为影响所述第一基模型权重的关键特征;
    将所述第一技能中不同意图对应的关键词进行组合,或者,将不同意图对应的关键词与非所述第一技能相关的词进行组合,组合后的词确定为所述不同意图对应的负向数据。
  3. 根据权利要求1或2所述的意图识别模型的学习方法,其特征在于,所述服务器根据所述第一技能中各个意图对应的正向数据生成所述第一技能中各个意图对应的负向数据还包括:
    所述服务器根据所述第一技能的分类,从服务器上存储的训练数据的全集中确定出第一集合,所述第一集合中的训练数据包含与所述第一技能中各个意图对应的负向数据;其中,训练数据包括正向数据和负向数据;
    从所述第一集合中采样预设数量的训练数据;
    采用人工标注和/或聚类算法,从采样得到的训练数据中,确定出与所述第一技能中各个意图对应的负向数据。
  4. 根据权利要求1-3任一项所述的意图识别模型的学习方法,其特征在于,所述服务器获取第二技能对应的数据包括:
    所述服务器根据所述第一技能的分类,和/或所述第一技能的正向数据,确定所述第二技能;
    通过共享层获取所述第二技能对应的训练数据。
  5. 根据权利要求1-4任一项所述的意图识别模型的学习方法,其特征在于,所述服务器根据所述第二技能对应的训练数据,以及预先设置的第一基模型进行学习,生成第二基模型包括:
    所述服务器根据所述第二技能对应的训练数据,以及预先设置的第一基模型,采用多任务学习方法进行学习,生成第二基模型。
  6. 根据权利要求1-5任一项所述的意图识别模型的学习方法,其特征在于,所述 服务器根据所述第一技能中各个意图对应的正向数据和所述第一技能中各个意图对应的负向数据,以及所述第二基模型进行学习,生成意图识别模型包括:
    所述服务器根据所述第一技能中各个意图对应的正向数据和所述第一技能中各个意图对应的负向数据,以及所述第二基模型,采用集成学习的方法进行学习,生成意图识别模型。
  7. 根据权利要求6所述的意图识别模型的学习方法,其特征在于,所述服务器生成意图识别模型具体包括:
    所述服务器在生成所述意图识别模型的过程中,生成多个中间模型;
    所述服务器确定最先生成的中间模型为最快的意图识别模型;和/或,所述服务器对所述多个中间模型进行模型选择以及参数调整后,确定出最佳的意图识别模型。
  8. 根据权利要求7所述的意图识别模型的学习方法,其特征在于,所述服务器确定出最佳的意图识别模型包括:
    所述服务器根据所述第一技能中各个意图对应的正向数据计算出各个中间模型的第一准确率;所述服务器根据所述第一技能中各个意图对应的负向数据计算出各个中间模型的第二准确率;
    所述服务器根据所述第一准确率、所述第二准确率,以及技能开发者输入的权重,对所述多个中间模型进行模型选择以及参数调整后,确定出最佳的意图识别模型。
  9. 一种服务器,其特征在于,包括:处理器,存储器以及通信接口;所述存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令,当所述处理器从所述存储器中读取所述计算机指令,以使得所述服务器执行如下操作:
    通过所述通信接口,接收技能开发者输入的第一技能中各个意图对应的正向数据;
    根据所述第一技能中各个意图对应的正向数据生成所述第一技能中各个意图对应的负向数据;
    获取第二技能对应的训练数据,所述第二技能为与所述第一技能相似的技能,所述第二技能的数量为至少一个;
    根据所述第二技能对应的训练数据,以及预先设置的第一基模型进行学习,生成第二基模型,所述第二基模型的数量为至少一个;
    根据所述第一技能中各个意图对应的正向数据和所述第一技能中各个意图对应的负向数据,以及所述第二基模型进行学习,生成意图识别模型。
  10. 根据权利要求9所述的服务器,其特征在于,所述服务器根据所述第一技能中各个意图对应的正向数据生成所述第一技能中各个意图对应的负向数据具体包括:
    所述服务器针对所述第一技能中各个意图,分别抽取出所述第一技能中各个意图对应的关键词,其中,所述关键词为影响所述第一基模型权重的关键特征;
    将所述第一技能中不同意图对应的关键词进行组合,或者,将不同意图对应的关键词与非所述第一技能相关的词进行组合,组合后的词确定为所述不同意图对应的负向数据。
  11. 根据权利要求9或10所述的服务器,其特征在于,所述服务器根据所述第一技能中各个意图对应的正向数据生成所述第一技能中各个意图对应的负向数据还具体包括:
    所述服务器根据所述第一技能的分类,从服务器上存储的训练数据的全集中确定出第一集合,所述第一集合中的训练数据包含与所述第一技能中各个意图对应的负向数据;其中,训练数据包括正向数据和负向数据;
    从所述第一集合中采样预设数量的训练数据;
    采用人工标注和/或聚类算法,从采样得到的训练数据中,确定出与所述第一技能中各个意图对应的负向数据。
  12. 根据权利要求9-11任一项所述的服务器,其特征在于,所述服务器获取第二技能对应的数据包括:
    所述服务器根据所述第一技能的分类,和/或所述第一技能的正向数据,确定所述第二技能;
    通过共享层获取所述第二技能对应的训练数据。
  13. 根据权利要求9-12任一项所述的服务器,其特征在于,所述服务器根据所述第二技能对应的训练数据,以及预先设置的第一基模型进行学习,生成第二基模型包括:
    所述服务器根据所述第二技能对应的训练数据,以及预先设置的第一基模型,采用多任务学习方法进行学习,生成第二基模型。
  14. 根据权利要求9-13任一项所述的服务器,其特征在于,所述服务器根据所述第一技能中各个意图对应的正向数据和所述第一技能中各个意图对应的负向数据,以及所述第二基模型进行学习,生成意图识别模型包括:
    所述服务器根据所述第一技能中各个意图对应的正向数据和所述第一技能中各个意图对应的负向数据,以及所述第二基模型,采用集成学习的方法进行学习,生成意图识别模型。
  15. 根据权利要求14所述的服务器,其特征在于,服务器生成意图识别模型还具体包括:
    所述服务器在生成所述意图识别模型的过程中,生成多个中间模型;
    所述服务器确定最先生成的中间模型为最快的意图识别模型;和/或,所述服务器对所述多个中间模型进行模型选择以及参数调整后,确定出最佳的意图识别模型。
  16. 根据权利要求15所述的服务器,其特征在于,所述服务器确定出最佳的意图识别模型包括:
    所述服务器根据所述第一技能中各个意图对应的正向数据计算出各个中间模型的第一准确率;所述服务器根据所述第一技能中各个意图对应的负向数据计算出各个中间模型的第二准确率;
    所述服务器根据所述第一准确率、所述第二准确率,以及技能开发者输入的权重,对所述多个中间模型进行模型选择以及参数调整后,确定出最佳的意图识别模型。
PCT/CN2018/106468 2018-09-19 2018-09-19 一种意图识别模型的学习方法、装置及设备 Ceased WO2020056621A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP18934368.4A EP3848855A4 (en) 2018-09-19 2018-09-19 LEARNING METHOD AND DEVICE FOR AN INTENTIONAL DETECTION MODEL AND DEVICE
US17/277,455 US12079579B2 (en) 2018-09-19 2018-09-19 Intention identification model learning method, apparatus, and device
CN201880093483.0A CN112154465B (zh) 2018-09-19 2018-09-19 一种意图识别模型的学习方法、装置及设备
PCT/CN2018/106468 WO2020056621A1 (zh) 2018-09-19 2018-09-19 一种意图识别模型的学习方法、装置及设备

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/106468 WO2020056621A1 (zh) 2018-09-19 2018-09-19 一种意图识别模型的学习方法、装置及设备

Publications (1)

Publication Number Publication Date
WO2020056621A1 true WO2020056621A1 (zh) 2020-03-26

Family

ID=69888084

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/106468 Ceased WO2020056621A1 (zh) 2018-09-19 2018-09-19 一种意图识别模型的学习方法、装置及设备

Country Status (4)

Country Link
US (1) US12079579B2 (zh)
EP (1) EP3848855A4 (zh)
CN (1) CN112154465B (zh)
WO (1) WO2020056621A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112650834A (zh) * 2020-12-25 2021-04-13 竹间智能科技(上海)有限公司 一种意图模型训练方法及装置
CN113407698A (zh) * 2021-06-30 2021-09-17 北京百度网讯科技有限公司 意图识别模型的训练与意图识别的方法、装置
US20210365639A1 (en) * 2020-05-20 2021-11-25 Beijing Baidu Netcom Science Technology Co., Ltd. Intent recognition optimization processing method, apparatus, and storage medium
CN113792655A (zh) * 2021-09-14 2021-12-14 京东鲲鹏(江苏)科技有限公司 一种意图识别方法、装置、电子设备及计算机可读介质

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11657797B2 (en) * 2019-04-26 2023-05-23 Oracle International Corporation Routing for chatbots
JP2022531994A (ja) * 2019-05-02 2022-07-12 サピエントエックス インコーポレイテッド 人工知能ベースの会話システムの生成および動作
US12099816B2 (en) * 2021-01-20 2024-09-24 Oracle International Corporation Multi-factor modelling for natural language processing
CN113515594B (zh) * 2021-04-28 2024-09-20 京东科技控股股份有限公司 意图识别方法、意图识别模型训练方法、装置及设备
CN113344197A (zh) * 2021-06-02 2021-09-03 北京三快在线科技有限公司 一种识别模型的训练方法、业务执行的方法以及装置
US12321428B2 (en) * 2021-07-08 2025-06-03 Nippon Telegraph And Telephone Corporation User authentication device, user authentication method, and user authentication computer program
CN113569581B (zh) * 2021-08-26 2023-10-17 中国联合网络通信集团有限公司 意图识别方法、装置、设备及存储介质
US12482454B2 (en) * 2023-08-10 2025-11-25 Fifth Third Bank Methods and systems for training and deploying natural language understanding models

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521599A (zh) 2011-09-30 2012-06-27 中国科学院计算技术研究所 一种基于集成学习的模式训练和识别方法
US20160188574A1 (en) * 2014-12-25 2016-06-30 Clarion Co., Ltd. Intention estimation equipment and intention estimation system
CN106407333A (zh) * 2016-09-05 2017-02-15 北京百度网讯科技有限公司 基于人工智能的口语查询识别方法及装置
CN106446951A (zh) * 2016-09-28 2017-02-22 中科院成都信息技术股份有限公司 一种基于奇异值选择的集成学习器
CN106528531A (zh) * 2016-10-31 2017-03-22 北京百度网讯科技有限公司 基于人工智能的意图分析方法及装置
CN107491531A (zh) 2017-08-18 2017-12-19 华南师范大学 基于集成学习框架的中文网络评论情感分类方法
CN108090520A (zh) * 2018-01-08 2018-05-29 北京中关村科金技术有限公司 意图识别模型的训练方法、系统、装置及可读存储介质

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8682814B2 (en) * 2010-12-14 2014-03-25 Symantec Corporation User interface and workflow for performing machine learning
WO2013151546A1 (en) * 2012-04-05 2013-10-10 Thomson Licensing Contextually propagating semantic knowledge over large datasets
US9761220B2 (en) * 2015-05-13 2017-09-12 Microsoft Technology Licensing, Llc Language modeling based on spoken and unspeakable corpuses
CN108446286B (zh) * 2017-02-16 2023-04-25 阿里巴巴集团控股有限公司 一种自然语言问句答案的生成方法、装置及服务器
US10395141B2 (en) * 2017-03-20 2019-08-27 Sap Se Weight initialization for machine learning models
US10839154B2 (en) * 2017-05-10 2020-11-17 Oracle International Corporation Enabling chatbots by detecting and supporting affective argumentation
US10943583B1 (en) * 2017-07-20 2021-03-09 Amazon Technologies, Inc. Creation of language models for speech recognition
CN107515857B (zh) * 2017-08-31 2020-08-18 科大讯飞股份有限公司 基于定制技能的语义理解方法及系统
CN107644642B (zh) * 2017-09-20 2021-01-15 Oppo广东移动通信有限公司 语义识别方法、装置、存储介质及电子设备
CN108172224B (zh) * 2017-12-19 2019-08-27 浙江大学 基于机器学习的防御无声指令控制语音助手的方法
US11315570B2 (en) * 2018-05-02 2022-04-26 Facebook Technologies, Llc Machine learning-based speech-to-text transcription cloud intermediary
US10832003B2 (en) * 2018-08-26 2020-11-10 CloudMinds Technology, Inc. Method and system for intent classification
US20200082272A1 (en) * 2018-09-11 2020-03-12 International Business Machines Corporation Enhancing Data Privacy in Remote Deep Learning Services

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521599A (zh) 2011-09-30 2012-06-27 中国科学院计算技术研究所 一种基于集成学习的模式训练和识别方法
US20160188574A1 (en) * 2014-12-25 2016-06-30 Clarion Co., Ltd. Intention estimation equipment and intention estimation system
CN106407333A (zh) * 2016-09-05 2017-02-15 北京百度网讯科技有限公司 基于人工智能的口语查询识别方法及装置
CN106446951A (zh) * 2016-09-28 2017-02-22 中科院成都信息技术股份有限公司 一种基于奇异值选择的集成学习器
CN106528531A (zh) * 2016-10-31 2017-03-22 北京百度网讯科技有限公司 基于人工智能的意图分析方法及装置
CN107491531A (zh) 2017-08-18 2017-12-19 华南师范大学 基于集成学习框架的中文网络评论情感分类方法
CN108090520A (zh) * 2018-01-08 2018-05-29 北京中关村科金技术有限公司 意图识别模型的训练方法、系统、装置及可读存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3848855A4

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210365639A1 (en) * 2020-05-20 2021-11-25 Beijing Baidu Netcom Science Technology Co., Ltd. Intent recognition optimization processing method, apparatus, and storage medium
US11972219B2 (en) * 2020-05-20 2024-04-30 Beijing Baidu Netcom Science Technology Co., Ltd. Intent recognition optimization processing method, apparatus, and storage medium
CN112650834A (zh) * 2020-12-25 2021-04-13 竹间智能科技(上海)有限公司 一种意图模型训练方法及装置
CN112650834B (zh) * 2020-12-25 2023-10-03 竹间智能科技(上海)有限公司 一种意图模型训练方法及装置
CN113407698A (zh) * 2021-06-30 2021-09-17 北京百度网讯科技有限公司 意图识别模型的训练与意图识别的方法、装置
CN113792655A (zh) * 2021-09-14 2021-12-14 京东鲲鹏(江苏)科技有限公司 一种意图识别方法、装置、电子设备及计算机可读介质

Also Published As

Publication number Publication date
EP3848855A4 (en) 2021-09-22
CN112154465A (zh) 2020-12-29
US20210350084A1 (en) 2021-11-11
EP3848855A1 (en) 2021-07-14
US12079579B2 (en) 2024-09-03
CN112154465B (zh) 2025-02-21

Similar Documents

Publication Publication Date Title
WO2020056621A1 (zh) 一种意图识别模型的学习方法、装置及设备
US11694032B2 (en) Template-based intent classification for chatbots
US12002456B2 (en) Using semantic frames for intent classification
US20240169153A1 (en) Detecting unrelated utterances in a chatbot system
US11886821B2 (en) Method and system for inferring answers from knowledge graphs
JP7771196B2 (ja) 自然言語プロセッサのための複数特徴均衡化
JP2022547631A (ja) 自然言語処理のためのストップワードデータ拡張
CN112579733B (zh) 规则匹配方法、规则匹配装置、存储介质及电子设备
JP2025166050A (ja) 自然言語処理のための強化されたロジット
CN117094387B (zh) 基于大数据的知识图谱构建方法及系统
CN111930884B (zh) 一种确定回复语句的方法、设备和人机对话系统
KR20220040997A (ko) 전자 장치 및 그 제어 방법
CN111538818A (zh) 数据查询方法、装置、电子设备及存储介质
WO2022131954A1 (ru) Способ управления диалогом и система понимания естественного языка в платформе виртуальных ассистентов
CN114546326A (zh) 一种虚拟人手语生成方法和系统
CN115023695A (zh) 更新用于人工智能的训练示例
US20230401385A1 (en) Hierarchical named entity recognition with multi-task setup
RU2818036C1 (ru) Способ и система управления диалоговым агентом в канале взаимодействия с пользователем
CN117236347B (zh) 交互文本翻译的方法、交互文本的显示方法和相关装置
CN111950688B (zh) 一种数据处理方法、装置和用于数据处理的装置
CN120805875A (zh) 一种工作流生成方法及计算设备
CN120428960A (zh) 低代码应用的开发方法及开发平台、电子设备及存储介质
CN118734868A (zh) 翻译方法、翻译模型的优化方法及装置
CN121808042A (zh) 使用机器学习模型从知识数据库中查询信息方法和装置及电子设备
CN121030541A (zh) 应用层级分类模型的训练方法、相关方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18934368

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018934368

Country of ref document: EP

Effective date: 20210407

WWG Wipo information: grant in national office

Ref document number: 201880093483.0

Country of ref document: CN