WO2024088039A1 - 人机对话方法、对话网络模型的训练方法及装置 - Google Patents
人机对话方法、对话网络模型的训练方法及装置 Download PDFInfo
- Publication number
- WO2024088039A1 WO2024088039A1 PCT/CN2023/123430 CN2023123430W WO2024088039A1 WO 2024088039 A1 WO2024088039 A1 WO 2024088039A1 CN 2023123430 W CN2023123430 W CN 2023123430W WO 2024088039 A1 WO2024088039 A1 WO 2024088039A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- knowledge
- dialogue
- data
- sample
- input
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
- G06F40/35—Discourse or dialogue representation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation
- G06F16/33295—Natural language query formulation in dialogue systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9024—Graphs; Linked lists
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Definitions
- the present disclosure relates to the field of artificial intelligence technology, and in particular to a human-computer dialogue method, a dialogue network model training method and a device.
- the dialogue system is one of the important technologies in artificial intelligence.
- the dialogue system is a computer system that aims to assist humans and complete natural, coherent and fluent communication tasks with humans. For example, based on the dialogue system, human-computer dialogue can be completed.
- the dialogue system may collect user dialogue data initiated by the user, determine feedback dialogue data for the user dialogue data from a preset type knowledge base, and output the feedback dialogue data.
- the preset type knowledge base includes text knowledge and knowledge graphs.
- the embodiments of the present disclosure provide a human-computer dialogue method, a training method and a device for a dialogue network model.
- an embodiment of the present disclosure provides a human-computer dialogue method, the method comprising:
- second feedback dialogue data corresponding to the second input dialogue data is generated and output, wherein the preset type knowledge base includes: a knowledge graph, text knowledge, and external knowledge other than the knowledge graph and text knowledge.
- the second semantic keyword is determined by combining the current dialogue environment data so that the second semantic keyword can represent content in more dimensions, thereby improving the rich representation ability of the second semantic keyword.
- the second feedback dialogue data is determined by introducing external knowledge to determine the second feedback dialogue data from more types of knowledge, which can avoid ambiguity and semantic fuzziness in the second feedback dialogue data caused by the lack of key knowledge, thereby improving the reliability and effectiveness of the second feedback dialogue data and improving the user's interactive experience.
- generating and outputting second feedback dialogue data corresponding to the second input dialogue data according to the second semantic keyword and the preset type knowledge base includes:
- the second feedback dialogue data is generated and outputted according to each acquired knowledge corresponding to the second semantic keyword.
- knowledge corresponding to the second semantic keyword is obtained from various types of knowledge (i.e., external knowledge, knowledge graph, text knowledge), so that the acquired knowledge is richer and more comprehensive, so that when the second feedback dialogue data is determined based on the richer and more comprehensive knowledge, the second feedback dialogue data has higher reliability and validity.
- knowledge i.e., external knowledge, knowledge graph, text knowledge
- the method further includes:
- the knowledge corresponding to the second semantic keyword in the at least one other type of knowledge includes the retrieved knowledge associated with the knowledge corresponding to the second semantic keyword obtained from the first type of knowledge; the first type of knowledge is any one of the external knowledge, the knowledge graph, and the textual knowledge, and the at least one other type of knowledge is any one of the external knowledge, the knowledge graph, and the textual knowledge.
- related knowledge is acquired from different types of knowledge by means of "multi-hop retrieval", which can avoid omission of knowledge, realize cross-retrieval among different types of knowledge, and improve the diversity and sufficiency of knowledge corresponding to the second semantic keyword.
- generating and outputting the second feedback dialogue data according to each acquired knowledge corresponding to the second semantic keyword includes:
- the second feedback dialogue data is generated and outputted according to the respective corresponding target feature vectors.
- generating and outputting the second feedback dialogue data according to the respective corresponding target feature vectors includes:
- the second feedback dialogue data is generated and outputted according to the target fusion feature vector.
- each target feature vector is fused so that the target fused feature vector can represent the features corresponding to each target feature vector, so that when the second feedback dialogue data is determined based on the target fused feature vector, the second feedback dialogue data has higher reliability and validity.
- the target fusion feature vector is obtained by inputting the respective corresponding target feature vectors into a pre-trained cross-attention network model.
- generating and outputting the second feedback dialogue data according to each acquired knowledge corresponding to the second semantic keyword includes:
- the second feedback dialogue data is generated and outputted according to the knowledge after the redundancy removal process.
- the method further comprises:
- generating and outputting second feedback dialogue data corresponding to the second input dialogue data based on the second semantic keywords and the preset type knowledge base including: if the third classification result is characterized as the need to introduce the external knowledge, then generating and outputting second feedback dialogue data corresponding to the second input dialogue data based on the second semantic keywords and the preset type knowledge base.
- the third classification result is obtained by inputting the second historical conversation data, the current conversation environment data, and the second input conversation data into a pre-trained binary classification network model;
- the binary classification model is a model that is capable of determining whether it is necessary to introduce the external knowledge based on learning a third sample data set
- the third sample data set includes third sample input dialogue data, third sample historical dialogue data, and second sample dialogue environment data corresponding to the third sample input dialogue data.
- the second semantic keyword is obtained by inputting the second historical conversation data, the current conversation environment data, and the second input conversation data into a pre-trained generative network model;
- the generative network model is a model based on the ability of a fourth sample data set to learn third sample semantic keywords corresponding to the fourth sample data set, and the fourth sample data set includes fourth sample input dialogue data, fourth sample historical dialogue data, and third sample dialogue environment data corresponding to the fourth sample input dialogue data.
- the external knowledge includes network knowledge and multimodal knowledge, wherein the multimodal knowledge is knowledge including pictures and texts.
- the embodiments of the present disclosure also provide a method for training a dialogue network model, the method comprising:
- the second sample data set includes second sample input dialogue data input by a user, second sample historical dialogue data, and first sample dialogue environment data corresponding to the second sample input dialogue data;
- a dialogue network model is trained to have the ability to predict feedback dialogue data corresponding to the input dialogue data, wherein the preset type knowledge base includes: a knowledge graph, text knowledge, and external knowledge other than the knowledge graph and text knowledge.
- the second sample semantic keywords are determined by combining the first sample dialogue environment data, so that the second sample semantic keywords can represent more dimensional content, thereby improving the rich representation ability of the second sample semantic keywords.
- the dialogue network model is obtained by introducing external knowledge training, so that the dialogue network model can be obtained from more types of knowledge training, thereby improving the reliability and effectiveness of the dialogue network model and improving the user's interactive experience.
- the training based on the second sample semantic keyword and the preset type knowledge base, to obtain a dialog network model capable of predicting feedback dialog data corresponding to the input dialog data includes:
- the dialogue network model is trained based on the acquired knowledge corresponding to the second sample semantic keywords.
- knowledge corresponding to the second sample semantic keywords is obtained from various types of knowledge (i.e., external knowledge, knowledge graph, and text knowledge), so that the acquired knowledge is richer and more comprehensive.
- knowledge i.e., external knowledge, knowledge graph, and text knowledge
- the dialogue network model is trained based on the richer and more comprehensive knowledge, the dialogue network model has higher reliability and effectiveness.
- the method further includes:
- the knowledge corresponding to the second sample semantic keyword in the at least one other type of knowledge includes the retrieved knowledge associated with the knowledge corresponding to the second sample semantic keyword obtained from the first type of knowledge;
- the first type of knowledge is any one of the external knowledge, the knowledge graph, and the text knowledge, and the at least one other type of knowledge is the type of knowledge among the external knowledge, the knowledge graph, and the text knowledge other than the first type of knowledge.
- related knowledge is acquired from different types of knowledge by means of "multi-hop retrieval", which can avoid omission of knowledge, realize cross-retrieval among different types of knowledge, and improve the diversity and sufficiency of knowledge corresponding to the second sample semantic keywords.
- the training of the dialogue network model according to each acquired knowledge corresponding to the second sample semantic keyword includes:
- the dialogue network model is obtained by training according to the corresponding feature vectors.
- the training of obtaining the dialogue network model according to the corresponding feature vectors includes:
- the parameters of the fourth neural network model are adjusted according to the second predicted dialogue data fed back to the user and the second pre-labeled dialogue data fed back to the user to obtain the dialogue network model.
- each feature vector is fused so that the fused feature vector can represent the features corresponding to each feature vector, thereby improving the reliability and effectiveness of the dialogue network model trained based on the fused feature vector.
- the fused feature vector is obtained by inputting the respective corresponding feature vectors into a pre-trained cross-attention network model.
- the training of the dialogue network model according to each acquired knowledge corresponding to the second sample semantic keyword includes:
- De-redundancy processing is performed on each acquired knowledge corresponding to the second sample semantic keyword to obtain de-redundancy processed knowledge
- the dialogue network model is obtained by training the knowledge after the redundancy removal process.
- the method further comprises:
- the dialogue network model capable of predicting feedback dialogue data corresponding to input dialogue data is trained based on the second sample semantic keywords and the preset type knowledge base, including: if the first classification result is characterized by the need to introduce the external knowledge, the dialogue network model is trained based on the second sample semantic keywords and the preset type knowledge base.
- the first classification result is obtained by inputting the second sample data set into a pre-trained binary classification network model
- the binary classification model is a model that is capable of determining whether it is necessary to introduce the external knowledge based on learning a third sample data set
- the third sample data set includes third sample input dialogue data, third sample historical dialogue data, and second sample dialogue environment data corresponding to the third sample input dialogue data.
- the second sample semantic keyword is obtained by inputting the second sample data set into a pre-trained generative network model
- the generative network model is a model based on the ability of a fourth sample data set to learn third sample semantic keywords corresponding to the fourth sample data set, and the fourth sample data set includes fourth sample input dialogue data, fourth sample historical dialogue data, and third sample dialogue environment data corresponding to the fourth sample input dialogue data.
- the embodiments of the present disclosure further provide a human-computer dialogue device, the device comprising:
- a first acquisition unit used to acquire second historical dialogue data, current dialogue environment data, and second input dialogue data input by the user;
- a first determining unit configured to determine a second semantic keyword according to the second historical dialogue data, the current dialogue environment data, and the second input dialogue data;
- a first generating unit configured to generate second feedback dialogue data corresponding to the second input dialogue data according to the second semantic keyword and a preset type knowledge base, wherein the preset type knowledge base includes: a knowledge graph, text knowledge, and external knowledge other than the knowledge graph and text knowledge;
- An output unit is used to output the second feedback dialogue data.
- the embodiments of the present disclosure also provide a training device for a dialogue network model, and the device method includes:
- a second acquisition unit is used to acquire a second sample data set, wherein the second sample data set includes second sample input dialogue data input by the user, second sample historical dialogue data, and first sample dialogue environment data corresponding to the second sample input dialogue data;
- a second generating unit configured to generate a second sample semantic keyword according to the second sample data set
- a training unit is used to train a dialogue network model capable of predicting feedback dialogue data corresponding to input dialogue data based on the second sample semantic keywords and a preset type knowledge base, wherein the preset type knowledge base includes: a knowledge graph, text knowledge, and external knowledge other than the knowledge graph and text knowledge.
- the embodiments of the present disclosure further provide a computer storage medium, on which computer instructions are stored.
- the computer instructions are executed by a processor, the human-computer interaction method described in any of the above embodiments is executed; or, the training method of the dialogue network model described in any of the above embodiments is executed.
- the embodiments of the present disclosure further provide an electronic device, including:
- the memory stores computer instructions that can be executed by the at least one processor, and the computer instructions are executed by the at least one processor, so that the human-computer interaction method described in any of the above embodiments is executed; or, the training method of the dialogue network model described in any of the above embodiments is executed.
- the embodiments of the present disclosure further provide a computer program product, which, when executed on a processor, enables the human-computer interaction method described in any of the above embodiments to be executed; or, enables the training method of the dialogue network model described in any of the above embodiments to be executed.
- the embodiments of the present disclosure further provide a chip, including:
- An input interface used to obtain the second historical dialogue data, the current dialogue environment data, and the second input dialogue data input by the user;
- a logic circuit configured to execute the human-computer interaction method as described in any of the above embodiments, to obtain second feedback dialogue data corresponding to the second input dialogue data;
- An output interface is used to output the second feedback dialogue data.
- the embodiments of the present disclosure further provide a chip, including:
- An input interface used to obtain a second sample data set, wherein the second sample data set includes second sample input dialogue data input by a user, second sample historical dialogue data, and first sample dialogue environment data corresponding to the second sample input dialogue data;
- a logic circuit configured to execute the method for training a dialog network model as described in any of the above embodiments, to obtain a dialog network model capable of predicting feedback dialog data corresponding to input dialog data;
- An output interface is used to output the dialogue network model.
- the embodiments of the present disclosure further provide a terminal device, including:
- a data collection device used to obtain the second historical dialogue data, the current dialogue environment data, and the second input dialogue data input by the user;
- a dialogue system used to determine a second semantic keyword based on the second historical dialogue data, the current dialogue environment data, and the second input dialogue data, and to generate and output second feedback dialogue data corresponding to the second input dialogue data based on the second semantic keyword and a preset type knowledge base, wherein the preset type knowledge base includes: a knowledge graph, text knowledge, and external knowledge other than the knowledge graph and text knowledge.
- FIG1 is a schematic diagram of a method for training a dialogue network model according to an embodiment of the present disclosure
- FIG2 is a schematic diagram of a human-computer dialogue method according to an embodiment of the present disclosure
- FIG3 is a schematic diagram showing the principle of a method for training a dialogue network model according to an embodiment of the present disclosure
- FIG4 is a schematic diagram of a method for training a dialogue network model according to another embodiment of the present disclosure.
- FIG5 is a schematic diagram of a human-computer dialogue method according to another embodiment of the present disclosure.
- FIG6 is a schematic diagram of a human-computer dialogue method according to another embodiment of the present disclosure.
- FIG7 is a schematic diagram showing the principle of a method for training a dialogue network model according to another embodiment of the present disclosure.
- FIG8 is a first schematic diagram of a scenario of a human-computer dialogue method according to an embodiment of the present disclosure
- FIG9 is a second schematic diagram of a scenario of the human-computer dialogue method according to an embodiment of the present disclosure.
- FIG10 is a schematic diagram of a human-machine dialogue device according to an embodiment of the present disclosure.
- FIG11 is a schematic diagram of a human-machine dialogue device according to another embodiment of the present disclosure.
- FIG12 is a schematic diagram of a training device for a dialogue network model according to an embodiment of the present disclosure.
- FIG13 is a schematic diagram of a training device for a dialogue network model according to another embodiment of the present disclosure.
- FIG. 14 is a schematic diagram of an electronic device according to an embodiment of the present disclosure.
- AI Artificial Intelligence
- Dialogue system is one of the important technologies in artificial intelligence.
- Dialogue system is a computer system that aims to assist humans and complete natural, coherent and fluent communication tasks with humans.
- dialogue systems can be divided into different types, such as Question Answering (QA), Task-Oriented, and Open-domain.
- QA Question Answering
- Task-Oriented Task-Oriented
- Open-domain Open-domain
- the question-answering dialogue system can be applied to scenarios where users need to answer questions, such as voice customer service in e-commerce stores and voice customer service in banks. Accordingly, the implementation principle of the question-answering dialogue system is as follows: in response to receiving dialogue data input by the user, the dialogue data includes questions raised by the user, searching and matching in a pre-built knowledge base based on the dialogue data, obtaining a feedback message for answering the question raised by the user from the knowledge base, and outputting the feedback message.
- the knowledge base can be a structured knowledge base, an unstructured knowledge base, or a semi-structured knowledge base.
- a knowledge base can be understood as a knowledge base that manages knowledge in the form of relational knowledge base tables
- an unstructured knowledge base can be understood as a knowledge base that does not have a fixed pattern
- a semi-structured knowledge base can be understood as a knowledge base that does not have a relational pattern but has a basic fixed structure pattern.
- a knowledge base includes knowledge, and the existence of knowledge in different types of knowledge bases may be different.
- a structured knowledge base may include a knowledge graph or a table.
- An unstructured knowledge base may include free documents.
- a semi-structured knowledge base may include a question-answer pair (QA pair).
- the task-based dialogue system can be applied to scenarios where tasks corresponding to the dialogue data input by the user are completed, such as ticket booking, navigation, etc. Accordingly, the implementation principle of the task-based dialogue system is: in response to receiving the dialogue data input by the user, determining the user intent corresponding to the dialogue data, and executing the task corresponding to the user intent.
- the technical architecture of the task-based dialogue system includes two types: pipeline and end-to-end.
- the task-based dialogue system with pipeline technical architecture mainly includes six modules: speech recognition (Automatic Speech Recognition, ASR), intention understanding (Natural Language Understanding, NLU), dialogue state tracking (DST), policy learning (PL), natural language generation (NLG), and speech generation (Text To Speech, TTS). DST and PL are collectively referred to as the dialogue management module (Dialogue Manager, DM).
- the task-based dialogue system with end-to-end technical architecture refers to an end-to-end task-based dialogue system composed of deep learning neural networks.
- the open domain dialogue system can be implemented in two ways, one is a search-based system and the other is a generative system.
- the search-based open domain dialogue system is similar to the question-answering dialogue system, which will not be described here.
- the generative open domain dialogue system can map the dialogue data input by the user into a reply sequence for replying to the dialogue data based on a pre-trained sequence-to-sequence model, determine the feedback message for answering the question raised by the user based on the reply sequence, and output the feedback message.
- the dialogue system can be implemented based on a neural network model.
- the neural network model in the training phase, can be trained so that the neural network model learns the ability to reply to the dialogue data input by the user, thereby obtaining a dialogue network model for replying to the dialogue data input by the user.
- the input of the dialogue network model is the dialogue data input by the user, and the output is a feedback message for replying to the dialogue data.
- the training principle of the dialogue network model is now exemplarily described as follows in conjunction with FIG1:
- S101 Acquire a first sample data set, where the first sample data set includes first sample input dialogue data input by a user and first sample historical dialogue data.
- the training of the neural network model can be performed in the cloud or locally, which is not limited in this embodiment.
- the execution subject can be a server or server cluster deployed in the cloud. If the conversation network model is trained locally, the execution subject can be any one of a server, server cluster, processor, and chip deployed locally.
- This embodiment does not limit the amount of data in the first sample data set, which can be determined based on demand, historical records, and experiments. For example, for a scenario with relatively high accuracy requirements, the amount of data in the first sample data set can be relatively large; conversely, for a scenario with relatively low accuracy requirements, the amount of data in the first sample data set can be relatively small.
- the first sample input dialogue data may be understood as dialogue data (such as audio) input by the user to the dialogue system in the historical dialogue between the user and the dialogue system.
- the first sample historical dialogue data may be understood as data (such as audio) in which the dialogue system responds to dialogue data input by the user to the dialogue system in the historical dialogue between the user and the dialogue system.
- S102 Generate a first sample semantic keyword corresponding to the first sample data set.
- the first sample semantic keyword can be understood as understanding the first sample input dialogue data based on natural language understanding (such as intent understanding in the task-based dialogue system in the above example) to obtain the user's intention to communicate with the dialogue system.
- natural language understanding such as intent understanding in the task-based dialogue system in the above example
- the first sample semantic keyword may be a word or a sentence, which is not limited in this embodiment.
- the knowledge base may include text knowledge and/or knowledge graph.
- this step can be understood as: based on the semantic similarity matching method, retrieving knowledge with a high semantic similarity to the first sample semantic keyword from the knowledge graph.
- this step can be understood as: based on the semantic similarity matching method, retrieving knowledge with a high semantic similarity to the first sample semantic keyword from the text knowledge.
- this step can be understood as: based on the semantic similarity matching method, retrieve the knowledge with high semantic similarity to the first sample semantic keyword from the knowledge graph, and based on the semantic similarity matching method, retrieve the knowledge with high semantic similarity to the first sample semantic keyword from the text. Knowledge with a high semantic similarity to the semantic keyword of the first sample is retrieved from this knowledge.
- the method for calculating semantic similarity may include: vectorizing the first sample semantic keyword and each knowledge in the knowledge base respectively to obtain a first vector corresponding to the first sample semantic keyword and a second vector corresponding to each knowledge in the knowledge base, and respectively calculating the cosine distance between the first vector and each second vector to obtain the first semantic similarity corresponding to the first sample semantic keyword and each knowledge in the knowledge base.
- a first similarity threshold can be set. If the first semantic similarity between a certain knowledge in the knowledge base and the first sample semantic keyword reaches (i.e. is greater than or equal to) the first similarity threshold, then the knowledge in the knowledge base is knowledge that is semantically similar to the first sample semantic keyword; conversely, if the first semantic similarity between a certain knowledge in the knowledge base and the first sample semantic keyword does not reach (i.e. is less than) the first similarity threshold, then the knowledge in the knowledge base is not knowledge that is semantically similar to the first sample semantic keyword.
- S104 Input the first sample data set and the acquired knowledge that is semantically similar to the first sample semantic keyword into the first neural network model, train the first neural network model, and obtain a dialogue network model.
- the first neural network model can be understood as a neural network model initialized before training, and the dialogue network model can be understood as a neural network model obtained after training that can be used to predict dialogue data to reply to the user.
- This embodiment does not limit the type, framework, and parameters of the first neural network model.
- the input of the first neural network model includes three dimensions of content, namely: the first sample input conversation data, the first sample historical conversation data, and the acquired knowledge that is semantically similar to the first semantic keyword.
- S104 may include: inputting the first sample input conversation data, the first sample historical conversation data, and the acquired knowledge semantically similar to the first sample semantic keyword into the first neural network model respectively, obtaining the first predicted conversation data for feedback to the user, calculating the first loss value between the first predicted conversation data for feedback to the user and the first pre-labeled conversation data for feedback to the user, and adjusting the parameters of the first neural network model based on the first loss value, and so on, until the first number of iterations reaches a first preset number threshold, or the first loss value is less than or equal to the first preset loss threshold, thereby obtaining a conversation model.
- the first preset number threshold and the first preset loss threshold can be determined based on demand, historical records, and experiments, and this embodiment does not limit this.
- S201 Acquire first input dialogue data input by the user.
- the dialogue network model can be trained in the cloud or locally. Accordingly, the application of the dialogue network model can be implemented in the cloud or locally. However, considering the real-time requirements of the dialogue and the requirements for efficient feedback, the application of the dialogue network model is usually implemented locally.
- the execution subject of the dialogue network model obtained through training we can call the execution subject of the dialogue network model obtained through training a training device, and call the execution subject of the dialogue network model applied an application device.
- the training device and the application device can be the same device or different devices, which is not limited in this embodiment.
- the training device can establish a communication link with the application device, and after the training device obtains the dialogue network model through training, the training device can send the dialogue network model to the application device based on the communication link. Accordingly, the application device can implement a dialogue with the user based on the dialogue network model.
- the dialogue system can be applied to different scenarios, and for different application scenarios, the application device may be different devices.
- the application device may be a computer of the e-commerce store with a dialogue network model deployed.
- the application device may be an intelligent robot with a dialogue network model deployed.
- the application device may be a vehicle-mounted terminal with a dialogue network model deployed.
- the application device can also be a mobile terminal or a smart speaker, etc., which are not listed here one by one.
- the application device may include a sound pickup component to obtain the first input dialogue data input by the user through the sound pickup component.
- the sound pickup component may be an audio collector such as a microphone for collecting audio.
- S202 Generate a first semantic keyword according to the first input dialogue data and the acquired first historical dialogue data.
- the first historical conversation data is data that the conversation system responds to the conversation data input by the user to the conversation system before receiving the first input conversation data, and thus, may be the same data as the first sample historical conversation data or may be different data.
- the task-based dialogue system includes an intention understanding module. Accordingly, as shown in Figure 3, the application task-based dialogue system obtains the first input dialogue data and the first historical dialogue data, and generates a first semantic keyword through the intention understanding module.
- S203 Based on the first semantic keyword, a semantic search is performed in a pre-built knowledge base to obtain knowledge with semantics similar to the first semantic keyword.
- the knowledge base may include text knowledge and/or knowledge graph.
- a semantic search is performed in the textual knowledge and/or knowledge graph based on the first semantic keyword to obtain knowledge with semantically similar meanings to the first semantic keyword.
- S204 Input the first input dialogue data, the first historical dialogue data, and the acquired knowledge that is semantically similar to the first semantic keyword into the dialogue network model trained based on the method described in S101-S104 above, to obtain first feedback dialogue data fed back to the user.
- the first input dialogue data, the first historical dialogue data, and the acquired knowledge semantically similar to the first semantic keyword are input into the dialogue network model to obtain first feedback dialogue data fed back to the user.
- the dialogue network model can determine the first feedback dialogue data in response to the first input dialogue data input by the user based on the input.
- S205 Outputting first feedback dialogue data fed back to the user.
- the application device may include an output component to output the first feedback dialogue data fed back to the user through the output component, wherein the output component may be a speaker.
- sensitive information in the first feedback dialogue data may be filtered through public opinion control to obtain filtered first feedback dialogue data, and the filtered first feedback dialogue data may be output to the user for feedback.
- the dialogue system may include a public opinion control module, and the output of the dialogue network model (i.e., the first feedback dialogue data) may be input into the public opinion control module, so that the public opinion control module filters the first feedback dialogue data, so that the output of the dialogue system is replied to the filtered first feedback dialogue data.
- the output of the dialogue network model i.e., the first feedback dialogue data
- text knowledge and knowledge graphs may not contain knowledge such as hot events and hot topics due to untimely updates; on the other hand, the knowledge representation of text knowledge and knowledge graphs is often sparse and limited, and it is difficult to represent knowledge at different granularities. Therefore, the knowledge represented by text knowledge and knowledge graphs is relatively not comprehensive; on the other hand, it is difficult to fully express and understand knowledge in a single dimension. For example, limited historical conversation data and single text knowledge may lack key information and lead to ambiguity and semantic fuzziness.
- the inventors of the present disclosure have creatively worked to obtain the inventive concept of the present disclosure: providing feedback on the input dialogue data input by the user by combining multiple different types of knowledge, and on this basis, providing feedback on the input dialogue data input by the user by combining a "multi-hop supplement" method.
- Multi-hop supplementation refers to searching for knowledge related to another type of knowledge from one type of knowledge.
- Multimodal knowledge refers to knowledge obtained by combining information of at least two modalities. For example, if multimedia information includes information represented by pictures, information represented by text, and information represented by audio, then the multimedia information can be multimodal knowledge, and the multimodal knowledge is knowledge obtained by combining information represented by pictures, information represented by text, and information represented by audio.
- Network knowledge can be understood as online knowledge obtained from the Internet, such as online knowledge obtained from wikis based on the Internet, online knowledge obtained from encyclopedias based on the Internet, and so on. They are not listed one by one here.
- S401 Acquire a second sample data set, where the second sample data set includes second sample input dialogue data input by a user, second sample historical dialogue data, and first sample dialogue environment data corresponding to the second sample input dialogue data.
- the execution subject of this embodiment can be a training device, and the training device can be deployed in the cloud or locally, etc., which will not be listed here one by one.
- the first sample dialogue environment data can be understood as data used to describe objects in a dialogue scene, such as data describing the location of a dialogue scene, data describing the status of a user in a dialogue scene, data describing the emotions of a user in a dialogue scene, and so on.
- S402 Input the second sample data set into a pre-trained binary classification network model to obtain a first classification result of whether external knowledge needs to be introduced.
- external knowledge refers to other knowledge besides knowledge graphs and text knowledge, such as image knowledge, multimodal knowledge, network knowledge, etc.
- the binary classification network model is trained based on the third sample data set and is used to determine whether external knowledge needs to be introduced into the neural network model.
- For the principle of training the binary classification network model please refer to the following steps:
- the first step obtaining a third sample data set, wherein the third sample data set includes third sample input dialogue data, third sample historical dialogue data, and second sample dialogue environment data corresponding to the third sample input dialogue data.
- Step 2 Input the third sample data set into the binary classification network model and output the second classification result, wherein the second classification result is used to indicate whether external knowledge needs to be introduced.
- the third step calculate the second loss value between the second classification result and the pre-labeled classification result, and adjust the parameters of the binary classification network model based on the second loss value, and so on, until the second iteration number reaches the second preset number threshold, or the second loss value is less than or equal to the second preset loss threshold, thereby obtaining a trained binary classification network model.
- the second preset number threshold and the second preset loss threshold can be determined based on demand, historical records, and experiments, and this embodiment does not limit this.
- the binary classification model may be a knowledge distillation (Tinybert) model.
- the principle of generating the second sample semantic keyword can refer to the above embodiments, such as the embodiment described in S102.
- the generative network model may be pre-trained to determine the second sample semantic keyword based on the generative network model and the second sample data set.
- the principle of training the generative network model may refer to the following steps:
- the first step obtaining a fourth sample data set, wherein the fourth sample data set includes fourth sample input dialogue data, fourth sample historical dialogue data, and third sample dialogue environment data corresponding to the fourth sample input dialogue data.
- Step 2 Input the fourth sample data set into the second neural network model and output the third sample semantic keywords.
- the third step calculate the third loss value between the third sample semantic keyword and the pre-labeled sample semantic keyword, and adjust the parameters of the second neural network model based on the third loss value, and so on, until the third number of iterations reaches the third preset number threshold, or the third loss value is less than or equal to the third preset loss threshold, thereby obtaining a trained generated network model.
- the third preset number threshold and the third preset loss threshold can be determined based on demand, historical records, and experiments, and this embodiment does not limit this. This embodiment does not limit the type, structure, and parameters of the second neural network model.
- S403 can be replaced by: if the first classification result representation needs to introduce external knowledge, the second sample data set is input into the generation network model, and the second sample semantic keywords are output.
- S404 Acquire knowledge corresponding to the second sample semantic keyword from a plurality of different types of knowledge respectively.
- knowledge corresponding to the second sample semantic keyword is obtained from text knowledge, graph knowledge, and external knowledge respectively.
- the various different types of knowledge include: text knowledge, knowledge graph, image knowledge, multimodal knowledge, and network knowledge. Therefore, this step can be understood as: obtaining the knowledge corresponding to the second sample semantic keyword from the text knowledge, and for the sake of distinction, the acquired knowledge can be called the first knowledge; obtaining the knowledge corresponding to the second sample semantic keyword from the knowledge graph, and for the sake of distinction, the acquired knowledge can be called the second knowledge; obtaining the knowledge corresponding to the second sample semantic keyword from the multimodal knowledge, and for the sake of distinction, the acquired knowledge can be called the third knowledge; obtaining the knowledge corresponding to the second sample semantic keyword from the network knowledge, and for the sake of distinction, the acquired knowledge can be called the fourth knowledge; obtaining the knowledge corresponding to the second sample semantic keyword from the image knowledge, and for the sake of distinction, the acquired knowledge can be called the fifth knowledge.
- Different methods may be used to acquire knowledge corresponding to the second sample semantic keyword from different types of knowledge.
- a sentence retrieval method (such as the algorithm bm25 used to evaluate relevance) or a sparse retrieval method (such as the algorithm bm25 used to evaluate relevance) can be used. methods), etc., to obtain knowledge corresponding to the semantic keywords of the second sample from network knowledge.
- the "encoding + similarity calculation” method can be used to obtain knowledge corresponding to the semantic keywords of the second sample from text knowledge and knowledge graphs.
- a graph network (Natural Graph) corresponding to the knowledge graph is constructed, and each piece of knowledge in the knowledge graph is encoded based on the graph network to obtain a third vector corresponding to each piece of knowledge in the knowledge graph.
- the second sample semantic keywords are encoded to obtain a fourth vector, and the second semantic similarity between the fourth vector and each third vector is calculated respectively.
- the knowledge corresponding to the third vector whose second semantic similarity is greater than a preset second similarity threshold is determined as the knowledge corresponding to the second sample semantic keyword.
- a semantic matching network model may also be pre-trained to obtain knowledge corresponding to the second sample semantic keyword from text knowledge and knowledge graph based on the semantic matching network model.
- the principle of training the semantic matching network model may refer to the following steps:
- the first step obtaining a fifth sample data set, wherein the fifth sample data set includes at least one fifth vector and a plurality of sixth vectors.
- the second step input at least one fifth vector and multiple sixth vectors into the third neural network model, and output a matching result, where the matching result is a sixth vector that semantically matches the at least one fifth vector.
- the third step calculate the fourth loss value between the matching result and the pre-calibrated matching result, and adjust the parameters of the third neural network model based on the fourth loss value, and so on, until the fourth iteration number reaches the fourth preset number threshold, or the fourth loss value is less than or equal to the fourth preset loss threshold, thereby obtaining a trained semantic matching network model.
- the fourth preset number threshold and the fourth preset loss threshold can be determined based on demand, historical records, and experiments, and this embodiment does not limit this.
- This embodiment does not limit the type, structure, and parameters of the third neural network model.
- the third neural network model can be a language representation model (Bidirectional Encoder Representation from Transformers, BERT).
- the fourth vector and each third vector may be respectively input into the semantic matching network model, and a third vector semantically matched with the third vector may be output.
- the method of the dual-tower structure model can be used to obtain knowledge corresponding to the second sample semantic keyword from the image knowledge and multimodal knowledge.
- the dual-tower structure model includes an image encoder and a text encoder, and the dual-tower structure model is trained by calculating the loss based on the image encoder and the text encoder respectively.
- the specific implementation principle can be found in the relevant technology, which will not be repeated here.
- a supplementary search may be performed in other types of knowledge based on the knowledge to obtain supplementary search knowledge.
- knowledge related to the first knowledge can be obtained from the knowledge graph, and this knowledge can be called supplementary retrieval knowledge.
- knowledge related to the first knowledge can be obtained from the knowledge graph and text knowledge respectively.
- the supplementary search knowledge may be obtained from one other type of knowledge, or may be obtained from multiple other types of knowledge, which is not limited in this embodiment.
- the “related” in “knowledge related to the first knowledge” can be understood as having a strong correlation with the first knowledge.
- the knowledge corresponding to the second sample semantic keyword is obtained from various different types of knowledge.
- the same knowledge may exist in different types of knowledge.
- the knowledge obtained corresponding to the second sample semantic keyword can be de-redundant, such as removing duplicate knowledge, the same knowledge in different expressions, and knowledge with high similarity from the knowledge obtained corresponding to the second sample semantic keyword (the implementation principle can be found in the above example and will not be repeated here).
- S405 Encode the acquired knowledge corresponding to the second sample semantic keyword respectively to obtain the corresponding feature vectors.
- the first knowledge is encoded to obtain a feature vector corresponding to the first knowledge.
- this feature vector can be called the first feature vector
- the second knowledge is encoded to obtain a feature vector corresponding to the second knowledge.
- this feature vector can be called the second feature vector
- the third knowledge is encoded to obtain a feature vector corresponding to the third knowledge.
- this feature vector can be called the third feature vector
- the fourth knowledge is encoded to obtain a feature vector corresponding to the fourth knowledge.
- this feature vector can be called the fourth feature vector
- the fifth knowledge is encoded to obtain a feature vector corresponding to the fifth knowledge.
- this feature vector can be called the fifth feature vector.
- S406 Input each obtained feature vector into a pre-trained cross-attention network model to fuse each feature vector based on the cross-attention network model to obtain a fused feature vector.
- the first eigenvector, the second eigenvector, the third eigenvector, the fourth eigenvector, and the fifth eigenvector are respectively input into a pre-trained cross-attention network model to output a fused feature vector.
- the first step obtaining a sixth sample data set, wherein the sixth sample data set includes a sample first feature vector, a sample second feature vector, a sample third feature vector, a sample fourth feature vector, and a sample fifth feature vector.
- the first feature vector of the sample is obtained by encoding the knowledge corresponding to the annotated semantic keywords obtained from text knowledge;
- the second feature vector of the sample is obtained by encoding the knowledge corresponding to the annotated semantic keywords obtained from knowledge graph knowledge;
- the third feature vector of the sample is obtained by encoding the knowledge corresponding to the annotated semantic keywords obtained from multimodal knowledge;
- the fourth feature vector of the sample is obtained by encoding the knowledge corresponding to the annotated semantic keywords obtained from network knowledge;
- the fifth feature vector of the sample is obtained by encoding the knowledge corresponding to the annotated semantic keywords obtained from image knowledge.
- Step 2 Input the sixth sample data set into the cross-attention network model and output the predicted fused feature vector.
- the third step calculate the fifth loss value between the predicted fusion feature vector and the pre-labeled fusion feature vector, and adjust the parameters of the cross-attention network model based on the fifth loss value, and so on, until the fifth iteration number reaches the fifth preset number threshold, or the fifth loss value is less than or equal to the fifth preset loss threshold, thereby obtaining a trained cross-attention network model.
- the fifth preset number threshold and the fifth preset loss threshold can be determined based on demand, historical records, and experiments, and this embodiment does not limit this.
- S407 Input the fused feature vector to the fourth neural network, and output the second predicted dialogue data fed back to the user.
- S408 Calculate the sixth loss value between the second predicted dialogue data fed back to the user and the second pre-labeled dialogue data fed back to the user, and adjust the parameters of the fourth neural network model based on the sixth loss value, and so on, until the sixth iteration number reaches the sixth preset number threshold, or the sixth loss value is less than or equal to the sixth preset loss threshold, thereby obtaining a trained dialogue network model.
- the sixth preset number threshold and the sixth preset loss threshold can be determined based on demand, historical records, and experiments, and this embodiment does not limit this. This embodiment does not limit the type, structure, and parameters of the fourth neural network model.
- S501 Acquire second historical dialogue data, current dialogue environment data, and second input dialogue data input by the user.
- the dialogue network model can be trained in the cloud or locally. Accordingly, the application of the dialogue network model can be implemented in the cloud or locally. However, considering the real-time requirements of the dialogue and the requirements for efficient feedback, the application of the dialogue network model is usually implemented locally.
- the execution subject of the dialogue network model obtained through training we can call the execution subject of the dialogue network model obtained through training a training device, and call the execution subject of the dialogue network model applied an application device.
- the training device and the application device can be the same device or different devices, which is not limited in this embodiment.
- the training device can establish a communication link with the application device, and after the training device obtains the dialogue network model through training, the training device can send the dialogue network model to the application device based on the communication link. Accordingly, the application device can implement a dialogue with the user based on the dialogue network model.
- the dialogue system can be applied to different scenarios, and for different application scenarios, the application device may be different devices.
- the application device may be a computer of the e-commerce store with a dialogue network model deployed.
- the application device may be an intelligent robot with a dialogue network model deployed.
- the application device may be a vehicle-mounted terminal with a dialogue network model deployed.
- the application device can also be a mobile terminal or a smart speaker, etc., which are not listed here one by one.
- the application device may include a sound pickup component to obtain the second input dialogue data input by the user through the sound pickup component.
- the sound pickup component may be an audio collector such as a microphone for collecting audio.
- the current conversation environment data can be obtained based on sensors set in the application device.
- the sensors in different application scenarios may be different.
- the sensor in the conversation scenario between the user indoors and the smart speaker, the sensor may be an image acquisition device; in another example, in the conversation scenario between the user outdoors and the car, the sensor may be an image acquisition device.
- the sensor In the dialogue scenario of the terminal, the sensor can be a speed sensor, etc., which are not listed here one by one.
- S502 Determine a second semantic keyword according to the second historical dialogue data, the current dialogue environment data, and the second input dialogue data.
- three dimensions of data i.e., second historical conversation data, current conversation environment data, and second input conversation data
- second semantic keyword has both current environment characteristics and current user conversation characteristics as well as historical conversation characteristics, so that the second semantic keyword has higher reliability.
- S503 Generate and output second feedback dialogue data corresponding to the second input dialogue data according to the second semantic keyword and a preset type knowledge base, wherein the preset type knowledge base includes: a knowledge graph, text knowledge, and external knowledge other than the knowledge graph and text knowledge.
- external knowledge is introduced to generate and output second feedback dialogue data by combining the external knowledge, thereby reducing the situation where the application device cannot understand, solving the problems of slow updating of text knowledge and graph knowledge, difficulty in comprehensive coverage and maintenance, sparse and limited knowledge, and single text knowledge being difficult to fully express comprehensive knowledge, thereby achieving the effectiveness and reliability of human-computer dialogue.
- S601 Acquire the second input dialogue data, the second historical dialogue data, and the current dialogue environment data input by the user.
- S602 The second input dialogue data, the second historical dialogue data, and the current dialogue environment data input by the user are input into the binary classification network model trained based on the method described in the above example to obtain a third classification result of whether external knowledge needs to be referenced.
- the task-based dialogue system includes a binary classification network model
- the input of the binary classification network model is the second input dialogue data, the second historical dialogue data, and the current dialogue environment data
- the output is a third classification result.
- the task-based dialogue system also includes a generation network model
- the input of the generation network model is the second input dialogue data, the dialogue history data, and the current dialogue environment data
- the output is the second semantic keyword
- S604 Acquire knowledge corresponding to the second semantic keyword from a plurality of different types of knowledge respectively.
- the different types of knowledge include: text knowledge, knowledge graph, picture knowledge, multimodal knowledge, and network knowledge.
- picture knowledge may include both image content and text content
- picture knowledge and multimodal knowledge are referred to as multimodal knowledge such as pictures.
- the knowledge corresponding to the second semantic keyword can be obtained from text knowledge, knowledge graphs, multimodal knowledge such as pictures, and network knowledge.
- more knowledge corresponding to the second semantic keyword can be obtained by using the "multi-hop search" method shown in Figure 7.
- Cross-validation refers to verifying the knowledge corresponding to the second semantic keyword acquired from at least two different types of knowledge to determine whether there is redundant knowledge so as to remove the redundant knowledge.
- S605 Encode the acquired knowledge corresponding to the second semantic keyword respectively to obtain the corresponding target feature vectors.
- S606 Input each target feature vector obtained into the cross-attention network model trained based on the method described in the above example, so as to fuse each target feature vector based on the cross-attention network model to obtain a target fused feature vector.
- the task-based dialogue system also includes a cross-attention network model, the input of the cross-attention network model is the result of encoding the acquired knowledge corresponding to the second semantic keyword, and the output is a target fusion feature vector.
- S607 Input the target fused feature vector to the dialogue network model trained based on the example described in S401-S408 to obtain second feedback dialogue data fed back to the user.
- the task-based dialogue system also includes a dialogue network model
- the output of the cross-attention network model is the input of the dialogue network model
- the output of the dialogue network model is the second feedback dialogue data
- sensitive information in the second feedback dialogue data can be filtered through public opinion control to obtain filtered second feedback dialogue data, so that the second feedback dialogue data after filtering is fed back to the user.
- the task-based dialogue system also includes a public opinion control module.
- the input of the public opinion control module is the dialogue network model
- the output of the public opinion control module is used to filter the first feedback dialogue data so that the output of the dialogue system is replied to the first feedback dialogue data after filtering.
- the dialogue system (such as a voice assistant, etc.) in the mobile terminal can determine that the second semantic keyword of the second input dialogue data is “panda,” and then the knowledge corresponding to “panda” can be obtained from network knowledge, such as obtaining knowledge about pandas from Wikipedia as shown in FIG8 , “Pandas are distributed in a certain city...,” and picture knowledge of pandas can also be obtained as shown in FIG8 .
- the second feedback dialogue data output by the dialogue system in the mobile terminal may be “black and white, so cute” as shown in FIG. 8 .
- the dialogue system (such as a voice assistant, etc.) in the mobile terminal can determine that the second semantic keyword of the second input dialogue data is “key”, and then the knowledge corresponding to “key” can be obtained from network knowledge, such as obtaining knowledge about the key from Wikipedia as shown in FIG9 “The key is to open the lock...”, or obtaining picture knowledge of the key as shown in FIG9 .
- the second feedback dialogue data output by the dialogue system in the mobile terminal may be “It should be placed on the bookshelf” as shown in FIG. 9 .
- FIG. 8 and FIG. 9 are only used to exemplify the knowledge corresponding to the second semantic keyword that may be obtained, and should not be understood as a limitation on obtaining the knowledge corresponding to the second semantic keyword.
- the first neural network model and the second neural network model can be neural network models of the same type; for another example, the third sample data set and the fourth sample data set can use the same data set; for another example, the first preset number threshold can be greater than the second number threshold, or less than the second number threshold, or equal to the second number threshold, etc., which will not be listed here one by one.
- the present disclosure further provides a human-machine dialogue device.
- FIG10 is a schematic diagram of a human-machine dialogue device according to an embodiment of the present disclosure. As shown in FIG10 , the human-machine dialogue device 100 includes:
- the first acquisition unit 1001 is used to acquire the second historical dialogue data, the current dialogue environment data, and the second input dialogue data input by the user.
- the first determining unit 1002 is configured to determine a second semantic keyword according to the second historical dialogue data, the current dialogue environment data, and the second input dialogue data.
- the first generating unit 1003 is used to generate second feedback dialogue data corresponding to the second input dialogue data according to the second semantic keyword and a preset type knowledge base, wherein the preset type knowledge base includes: a knowledge graph, text knowledge, and external knowledge other than the knowledge graph and text knowledge.
- the output unit 1004 is configured to output the second feedback dialogue data.
- FIG. 11 is a schematic diagram of a human-machine dialogue device according to another embodiment of the present disclosure. As shown in FIG. 11 , the human-machine dialogue device 1100 includes:
- the first acquisition unit 1101 is used to acquire the second historical dialogue data, the current dialogue environment data, and the second input dialogue data input by the user.
- the first determining unit 1102 is configured to determine a second semantic keyword according to the second historical dialogue data, the current dialogue environment data, and the second input dialogue data.
- the second semantic keyword is obtained by inputting the second historical conversation data, the current conversation environment data, and the second input conversation data into a pre-trained generative network model;
- the generative network model is a model based on the ability of a fourth sample data set to learn third sample semantic keywords corresponding to the fourth sample data set, and the fourth sample data set includes fourth sample input dialogue data, fourth sample historical dialogue data, and third sample dialogue environment data corresponding to the fourth sample input dialogue data.
- the second determining unit 1103 determines whether it is necessary to introduce the third classification result of the external knowledge according to the second historical dialogue data, the current dialogue environment data, and the second input dialogue data.
- the first generating unit 1104 is used to generate second feedback dialogue data corresponding to the second input dialogue data according to the second semantic keyword and a preset type knowledge base, wherein the preset type knowledge base includes: a knowledge graph, text knowledge, and external knowledge other than the knowledge graph and text knowledge.
- the first generating unit 1104 is configured to, if the third classification result is characterized as requiring the introduction of the external knowledge, Then, second feedback dialogue data corresponding to the second input dialogue data is generated according to the second semantic keyword and the preset type knowledge base.
- the third classification result is obtained by inputting the second historical conversation data, the current conversation environment data, and the second input conversation data into a pre-trained binary classification network model;
- the binary classification model is a model that is capable of determining whether it is necessary to introduce the external knowledge based on learning a third sample data set
- the third sample data set includes third sample input dialogue data, third sample historical dialogue data, and second sample dialogue environment data corresponding to the third sample input dialogue data.
- the first generating unit 1104 includes:
- the first acquisition subunit 11041 is used to acquire the knowledge corresponding to the second semantic keyword from the external knowledge, the knowledge graph, and the text knowledge respectively.
- the generating subunit 11042 is used to generate the second feedback dialogue data according to each acquired knowledge corresponding to the second semantic keyword.
- the first generating unit 1104 further includes:
- the first retrieval subunit 11043 is used to search in at least one other type of knowledge based on the knowledge corresponding to the second semantic keyword obtained from the first type of knowledge, to obtain knowledge associated with the knowledge corresponding to the second semantic keyword obtained from the first type of knowledge.
- the knowledge corresponding to the second semantic keyword in the at least one other type of knowledge includes the retrieved knowledge associated with the knowledge corresponding to the second semantic keyword obtained from the first type of knowledge;
- the first type of knowledge is any one of the external knowledge, the knowledge graph, and the text knowledge, and the at least one other type of knowledge is the type of knowledge among the external knowledge, the knowledge graph, and the text knowledge other than the first type of knowledge.
- the generating subunit 11042 includes:
- the first encoding module is used to encode each acquired knowledge corresponding to the second semantic keyword to obtain the corresponding target feature vector.
- the first generating module is used to generate the second feedback dialogue data according to the respective corresponding target feature vectors.
- the first generation module includes:
- the first fusion submodule is used to perform fusion processing on the respective corresponding target feature vectors to obtain a target fused feature vector.
- a generating submodule is used to generate the second feedback dialogue data according to the target fusion feature vector.
- the generating subunit 11042 further includes:
- the first processing module is used to perform redundancy removal processing on each acquired knowledge corresponding to the second semantic keyword to obtain knowledge after redundancy removal processing.
- the second generating module is used to generate the second feedback dialogue data according to the knowledge after the redundancy removal processing.
- the output unit 1105 is configured to output the second feedback dialogue data.
- FIG12 is a schematic diagram of a training device for a dialogue network model according to an embodiment of the present disclosure.
- a human-computer dialogue device 1200 includes:
- the second acquisition unit 1201 is used to acquire a second sample data set, where the second sample data set includes second sample input dialogue data input by the user, second sample historical dialogue data, and first sample dialogue environment data corresponding to the second sample input dialogue data.
- the second generating unit 1202 is configured to generate a second sample semantic keyword according to the second sample data set.
- the training unit 1203 is used to train a dialogue network model capable of predicting feedback dialogue data corresponding to the input dialogue data based on the second sample semantic keywords and a preset type knowledge base, wherein the preset type knowledge base includes: a knowledge graph, text knowledge, and external knowledge other than the knowledge graph and text knowledge.
- FIG13 is a schematic diagram of a training device for a dialogue network model according to another embodiment of the present disclosure.
- a human-computer dialogue device 1300 includes:
- the second acquisition unit 1301 is used to acquire a second sample data set, where the second sample data set includes second sample input dialogue data input by the user, second sample historical dialogue data, and first sample dialogue environment data corresponding to the second sample input dialogue data.
- the second generating unit 1302 is configured to generate a second sample semantic keyword according to the second sample data set.
- the third determining unit 1303 is used to determine whether it is necessary to introduce the first classification result of the external knowledge according to the second sample data set.
- the training unit 1304 is used to train the prediction and input A conversation network model of feedback conversation data capability corresponding to conversation data, wherein the preset type knowledge base includes: a knowledge graph, text knowledge, and external knowledge other than the knowledge graph and text knowledge.
- the first classification result is obtained by inputting the second sample data set into a pre-trained binary classification network model
- the binary classification model is a model that is capable of determining whether it is necessary to introduce the external knowledge based on learning a third sample data set
- the third sample data set includes third sample input dialogue data, third sample historical dialogue data, and second sample dialogue environment data corresponding to the third sample input dialogue data.
- the second sample semantic keywords are obtained by inputting the second sample data set into a pre-trained generative network model
- the generative network model is a model based on the ability of a fourth sample data set to learn third sample semantic keywords corresponding to the fourth sample data set, and the fourth sample data set includes fourth sample input dialogue data, fourth sample historical dialogue data, and third sample dialogue environment data corresponding to the fourth sample input dialogue data.
- the training unit 1304 is used to train the dialogue network model according to the second sample semantic keywords and a preset type knowledge base if the first classification result is characterized as needing to introduce the external knowledge.
- the training unit 1304 includes:
- the second acquisition subunit 13041 is used to acquire the knowledge corresponding to the second sample semantic keyword from the external knowledge, the knowledge graph, and the text knowledge respectively.
- the training subunit 13042 is used to train the dialogue network model according to the acquired knowledge corresponding to the second sample semantic keywords.
- the training subunit 13042 includes:
- the second encoding module is used to encode each acquired knowledge corresponding to the second sample semantic keyword to obtain the corresponding feature vector.
- the second encoding module includes:
- a second fusion submodule is used to fuse the corresponding feature vectors to obtain a fused feature vector
- An input submodule used for inputting the fused feature vector into a fourth neural network, and outputting second predicted dialogue data fed back to the user;
- An adjustment submodule is used to adjust the parameters of the fourth neural network model according to the second predicted dialogue data fed back to the user and the second pre-labeled dialogue data fed back to the user to obtain the dialogue network model.
- the fused feature vector is obtained by inputting the respective corresponding feature vectors into a pre-trained cross-attention network model.
- the first training module is used to train the dialogue network model according to the corresponding feature vectors.
- the training unit 1304 further includes:
- the second retrieval subunit 13043 is used to search in at least one other type of knowledge based on the knowledge corresponding to the second sample semantic keyword obtained from the first type of knowledge, and obtain knowledge associated with the knowledge corresponding to the second sample semantic keyword obtained from the first type of knowledge.
- the knowledge corresponding to the second sample semantic keyword in the at least one other type of knowledge includes the retrieved knowledge associated with the knowledge corresponding to the second sample semantic keyword obtained from the first type of knowledge;
- the first type of knowledge is any one of the external knowledge, the knowledge graph, and the text knowledge, and the at least one other type of knowledge is the type of knowledge among the external knowledge, the knowledge graph, and the text knowledge other than the first type of knowledge.
- the training subunit 13042 further includes:
- a second processing module is used to perform redundancy removal processing on each acquired knowledge corresponding to the second sample semantic keyword to obtain knowledge after redundancy removal processing;
- the second training module is used to obtain the dialogue network model according to the knowledge training after the redundancy removal processing.
- the electronic device 1400 may include one or more of the following components: a processing component 1401, a memory 1402, a power component 1403, a multimedia component 1404, an audio component 1405, an input/output (I/O) interface 1406, a sensor component 1407, and a communication component 1408.
- a processing component 1401 a memory 1402, a power component 1403, a multimedia component 1404, an audio component 1405, an input/output (I/O) interface 1406, a sensor component 1407, and a communication component 1408.
- the processing component 1401 generally controls the overall operation of the electronic device 1400, such as operations associated with video playback, display, phone calls, data communications, camera operations, and recording operations.
- the processing component 1401 may include one or more processors 14011 to execute instructions, To complete all or part of the steps of the above method.
- the processing component 1401 may include one or more modules to facilitate the interaction between the processing component 1401 and other components.
- the processing component 1401 may include a multimedia module to facilitate the interaction between the multimedia component 1404 and the processing component 1404.
- the memory 1402 is configured to store various types of data to support operations on the electronic device 1400. Examples of such data include instructions for any application or method operating on the electronic device 1400, such as videos, contact data, phone book data, messages, and pictures, etc.
- the memory 1402 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk.
- SRAM static random access memory
- EEPROM electrically erasable programmable read-only memory
- EPROM erasable programmable read-only memory
- PROM programmable read-only memory
- ROM read-only memory
- magnetic memory flash memory
- flash memory magnetic disk or optical disk.
- the power supply component 1403 provides power to various components of the electronic device 1400.
- the power supply component 1403 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to the electronic device 1400.
- the multimedia component 1404 includes a screen that provides an output interface between the electronic device 1400 and the user.
- the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from the user.
- the touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundaries of the touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.
- the multimedia component 1404 includes a front camera and/or a rear camera.
- the front camera and/or the rear camera may receive external multimedia data.
- Each front camera and the rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
- the audio component 1405 is configured to output and/or input audio data.
- the audio component 1405 includes a microphone (MIC), and when the electronic device 1400 is in an operating mode, such as a call mode, a recording mode, and a speech recognition mode, the microphone is configured to receive external audio data.
- the received audio data can be further stored in the memory 1402 or sent via the communication component 1408.
- the audio component 1405 also includes a speaker for outputting audio data, such as outputting the audio gain and original audio data determined by the method described in the above embodiment.
- I/O interface 1406 provides an interface between processing component 1401 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include but are not limited to: home button, volume button, start button, and lock button.
- the sensor assembly 1407 includes one or more sensors for providing various aspects of status assessment for the electronic device 1400.
- the sensor assembly 1407 can detect the open/closed state of the electronic device 1400, the relative positioning of components, such as the display and keypad of the electronic device 1400, and the sensor assembly 1407 can also detect the position change of the electronic device 1400 or a component of the electronic device 1400, the presence or absence of user contact with the electronic device 1400, the orientation or acceleration/deceleration of the electronic device 1400, and the temperature change of the electronic device 1400.
- the sensor assembly 1407 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact.
- the sensor assembly 1407 may also include an optical sensor, such as a CMOS or CCD image sensor, for use in imaging applications.
- the sensor assembly 1407 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
- the communication component 1408 is configured to facilitate wired or wireless communication between the electronic device 1400 and other devices.
- the electronic device 1400 can access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof.
- the communication component 1408 receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel.
- the communication component 1408 also includes a near field communication (NFC) module to facilitate short-range communication.
- the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
- RFID radio frequency identification
- IrDA infrared data association
- UWB ultra-wideband
- Bluetooth Bluetooth
- the electronic device 1400 may be implemented by one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components to perform the above methods.
- ASICs application-specific integrated circuits
- DSPs digital signal processors
- DSPDs digital signal processing devices
- PLDs programmable logic devices
- FPGAs field programmable gate arrays
- controllers microcontrollers, microprocessors, or other electronic components to perform the above methods.
- a non-transitory computer-readable storage medium including instructions is also provided, such as a memory 1402 including instructions, and the instructions can be executed by a processor 14011 of an electronic device 1400 to perform the above method.
- the non-transitory computer-readable storage medium can be a ROM, a random access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, etc.
- the embodiments of the present disclosure further provide a computer storage medium, on which computer instructions are stored.
- the computer instructions are executed by a processor, the human-computer interaction method described in any of the above embodiments is executed; or, the training method of the dialogue network model described in any of the above embodiments is executed.
- the embodiments of the present disclosure further provide a computer program product, which, when executed on a processor, enables the human-computer interaction method described in any of the above embodiments to be executed; or, enables the training method of the dialogue network model described in any of the above embodiments to be executed.
- the embodiments of the present disclosure further provide a chip, including:
- An input interface used to obtain the second historical dialogue data, the current dialogue environment data, and the second input dialogue data input by the user;
- a logic circuit configured to execute the human-computer interaction method as described in any of the above embodiments, to obtain second feedback dialogue data corresponding to the second input dialogue data;
- An output interface is used to output the second feedback dialogue data.
- the embodiments of the present disclosure further provide a chip, including:
- An input interface used to obtain a second sample data set, wherein the second sample data set includes second sample input dialogue data input by a user, second sample historical dialogue data, and first sample dialogue environment data corresponding to the second sample input dialogue data;
- a logic circuit configured to execute the method for training a dialog network model as described in any of the above embodiments, to obtain a dialog network model capable of predicting feedback dialog data corresponding to input dialog data;
- An output interface is used to output the dialogue network model.
- the embodiments of the present disclosure further provide a terminal device, including:
- a data collection device used to obtain the second historical dialogue data, the current dialogue environment data, and the second input dialogue data input by the user;
- a dialogue system used to determine a second semantic keyword based on the second historical dialogue data, the current dialogue environment data, and the second input dialogue data, and to generate and output second feedback dialogue data corresponding to the second input dialogue data based on the second semantic keyword and a preset type knowledge base, wherein the preset type knowledge base includes: a knowledge graph, text knowledge, and external knowledge other than the knowledge graph and text knowledge.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Human Computer Interaction (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
Description
Claims (27)
- 一种人机对话方法,其特征在于,所述方法包括:获取第二历史对话数据、当前的对话环境数据、用户输入的第二输入对话数据;根据所述第二历史对话数据、所述当前的对话环境数据、所述第二输入对话数据确定第二语义关键字;根据所述第二语义关键字和预设类型知识库,生成并输出与所述第二输入对话数据对应的第二反馈对话数据,其中,所述预设类型知识库包括:知识图谱、文本知识、以及除所述知识图谱和文本知识之外的外部知识。
- 根据权利要求1所述的方法,其特征在于,所述根据所述第二语义关键字和预设类型知识库,生成并输出与所述第二输入对话数据对应的第二反馈对话数据,包括:分别从所述外部知识、所述知识图谱、所述文本知识中,获取与所述第二语义关键字对应的知识;根据各获取到的与所述第二语义关键字对应的知识,生成并输出所述第二反馈对话数据。
- 根据权利要求2所述的方法,其特征在于,在所述分别从所述外部知识、所述知识图谱、所述文本知识中,获取与所述第二语义关键字对应的知识之后,所述方法还包括:根据从第一类型知识获取到的与所述第二语义关键字对应的知识,在至少一个其他类型知识中进行检索,得到与从第一类型知识获取到的与所述第二语义关键字对应的知识相关联的知识;其中,所述至少一个其他类型知识中的与所述第二语义关键字对应的知识包括检索得到的与所述从第一类型知识获取到的与所述第二语义关键字对应的知识相关联的知识;所述第一类型知识为所述外部知识、所述知识图谱、所述文本知识中的任意一种,所述至少一个其他类型知识为所述外部知识、所述知识图谱、所述文本知识中除所述第一类型知识之外的类型知识。
- 根据权利要求2所述的方法,其特征在于,所述根据各获取到的与所述第二语义关键字对应的知识,生成并输出所述第二反馈对话数据,包括:对各获取到的与所述第二语义关键字对应的知识分别进行编码,得到各自对应的目标特征向量;根据所述各自对应的目标特征向量,生成并输出所述第二反馈对话数据。
- 根据权利要求4所述的方法,其特征在于,所述根据所述各自对应的目标特征向量,生成并输出所述第二反馈对话数据,包括:对所述各自对应的目标特征向量进行融合处理,得到目标融合特征向量;根据所述目标融合特征向量生成并输出所述第二反馈对话数据。
- 根据权利要求5所述的方法,其特征在于,所述目标融合特征向量是将所述各自对应的目标特征向量输入至预先训练得到的交叉注意力网络模型得到的。
- 根据权利要求2-6中任一项所述的方法,其特征在于,所述根据各获取到的与所述第二语义关键字对应的知识,生成并输出所述第二反馈对话数据,包括:对各获取到的与所述第二语义关键字对应的知识进行去冗余处理,得到去冗余处理后的知识;根据所述去冗余处理后的知识生成并输出所述第二反馈对话数据。
- 根据权利要求1-6中任一项所述的方法,其特征在于,所述方法还包括:根据所述第二历史对话数据、所述当前的对话环境数据、所述第二输入对话数据,确定是否需要引入所述外部知识的第三分类结果;以及,所述根据所述第二语义关键字和预设类型知识库,生成并输出与所述第二输入对话数据对应的第二反馈对话数据,包括:若所述第三分类结果表征为需要引入所述外部知识,则根据所述第二语义关键字和预设类型知识库,生成并输出与所述第二输入对话数据对应的第二反馈对话数据。
- 根据权利要求8所述的方法,其特征在于,所述第三分类结果是将所述第二历史对话数据、所述当前的对话环境数据、所述第二输入对话数据输入至预先训练的二分类网络模型得到的;其中,所述二分类模型是基于第三样本数据集学习确定是否需要引入所述外部知识的能力的模型,所述第三样本数据集中包括第三样本输入对话数据、第三样本历史对话数据、以及与第三样本输入对话数据对应的第二样本对话环境数据。
- 根据权利要求1-9中任一项所述的方法,其特征在于,所述第二语义关键字是将所述第二历史对话数据、所述当前的对话环境数据、所述第二输入对话数据输入至预先训练的生成网络模型得到的;其中,所述生成网络模型是基于第四样本数据集学习与所述第四样本数据集对应的第三样本语义关键字的能力的模型,所述第四样本数据集中包括第四样本输入对话数据、第四样本历史对话数据、以及与第四样本输入对话数据对应的第三样本对话环境数据。
- 一种对话网络模型的训练方法,其特征在于,所述方法包括:获取第二样本数据集,所述第二样本数据集中包括用户输入的第二样本输入对话数据、第二样本历史对话数据、与所述第二样本输入对话数据对应的第一样本对话环境数据;根据所述第二样本数据集生成第二样本语义关键字;根据所述第二样本语义关键字和预设类型知识库,训练得到具有预测与输入对话数据对应的反馈对话数据能力的对话网络模型,其中,所述预设类型知识库包括:知识图谱、文本知识、以及除所述知识图谱和文本知识之外的外部知识。
- 根据权利要求11所述的方法,其特征在于,所述根据所述第二样本语义关键字和预设类型知识库,训练得到具有预测与输入对话数据对应的反馈对话数据能力的对话网络模型,包括:分别从所述外部知识、所述知识图谱、所述文本知识中,获取与所述第二样本语义关键字对应的知识;根据各获取到的与所述第二样本语义关键字对应的知识,训练得到所述对话网络模型。
- 根据权利要求12所述的方法,其特征在于,在所述分别从所述外部知识、所述知识图谱、所述文本知识中,获取与所述第二样本语义关键字对应的知识之后,所述方法还包括:根据从第一类型知识获取到的与所述第二样本语义关键字对应的知识,在至少一个其他类型知识中进行检索,得到与从第一类型知识获取到的与所述第二样本语义关键字对应的知识相关联的知识;其中,所述至少一个其他类型知识中的与所述第二样本语义关键字对应的知识包括检索得到的与所述从第一类型知识获取到的与所述第二样本语义关键字对应的知识相关联的知识;所述第一类型知识为所述外部知识、所述知识图谱、所述文本知识中的任意一种,所述至少一个其他类型知识为所述外部知识、所述知识图谱、所述文本知识中除所述第一类型知识之外的类型知识。
- 根据权利要求12所述的方法,其特征在于,所述根据各获取到的与所述第二样本语义关键字对应的知识,训练得到所述对话网络模型,包括:对各获取到的与所述第二样本语义关键字对应的知识分别进行编码,得到各自对应的特征向量;根据所述各自对应的特征向量训练得到所述对话网络模型。
- 根据权利要求14所述的方法,其特征在于,所述根据所述各自对应的特征向量训练得到所述对话网络模型,包括:对所述各自对应的特征向量进行融合处理,得到融合特征向量;将所述融合特征向量输入至第四神经网络,输出第二预测的向用户反馈的对话数据;根据所述第二预测的向用户反馈的对话数据、以及第二预先标注的向用户反馈的对话数据对所述第四神经网络模型的参数进行调整,得到所述对话网络模型。
- 根据权利要求15所述的方法,其特征在于,所述融合特征向量是将所述各自对应的特征向量输入至预先训练得到的交叉注意力网络模型得到的。
- 根据权利要求12-16中任一项所述的方法,其特征在于,所述根据各获取到的与所述第二样本语义关键字对应的知识,训练得到所述对话网络模型,包括:对各获取到的与所述第二样本语义关键字对应的知识进行去冗余处理,得到去冗余处理后的知识;根据所述去冗余处理后的知识训练得到所述对话网络模型。
- 根据权利要求11-16中任一项所述的方法,其特征在于,所述方法还包括:根据所述第二样本数据集,确定是否需要引入所述外部知识的第一分类结果;以及,所述根据所述第二样本语义关键字和预设类型知识库,训练得到具有预测与输入对话数据对应的反馈对话数据能力的对话网络模型,包括:若所述第一分类结果表征为需要引入所述外部知识,则根据所述第二样本语义关键字和预设类型知识库,训练得到所述对话网络模型。
- 根据权利要求18所述的方法,其特征在于,所述第一分类结果是将所述第二样本数据集输入至预先训练的二分类网络模型得到的;其中,所述二分类模型是基于第三样本数据集学习确定是否需要引入所述外部知识的能力的模型, 所述第三样本数据集中包括第三样本输入对话数据、第三样本历史对话数据、以及与第三样本输入对话数据对应的第二样本对话环境数据。
- 根据权利要求11-19中任一项所述的方法,其特征在于,所述第二样本语义关键字是将所述第二样本数据集输入至预先训练的生成网络模型得到的;其中,所述生成网络模型是基于第四样本数据集学习与所述第四样本数据集对应的第三样本语义关键字的能力的模型,所述第四样本数据集中包括第四样本输入对话数据、第四样本历史对话数据、以及与第四样本输入对话数据对应的第三样本对话环境数据。
- 一种人机对话装置,其特征在于,所述装置包括:第一获取单元,用于获取第二历史对话数据、当前的对话环境数据、用户输入的第二输入对话数据;第一确定单元,用于根据所述第二历史对话数据、所述当前的对话环境数据、所述第二输入对话数据确定第二语义关键字;第一生成单元,用于根据所述第二语义关键字和预设类型知识库,生成与所述第二输入对话数据对应的第二反馈对话数据,其中,所述预设类型知识库包括:知识图谱、文本知识、以及除所述知识图谱和文本知识之外的外部知识;输出单元,用于输出所述第二反馈对话数据。
- 一种对话网络模型的训练装置,其特征在于,所述装置包括:第二获取单元,用于获取第二样本数据集,所述第二样本数据集中包括用户输入的第二样本输入对话数据、第二样本历史对话数据、与所述第二样本输入对话数据对应的第一样本对话环境数据;第二生成单元,用于根据所述第二样本数据集生成第二样本语义关键字;训练单元,用于根据所述第二样本语义关键字和预设类型知识库,训练得到具有预测与输入对话数据对应的反馈对话数据能力的对话网络模型,其中,所述预设类型知识库包括:知识图谱、文本知识、以及除所述知识图谱和文本知识之外的外部知识。
- 一种计算机存储介质,其特征在于,所述计算机存储介质上存储有计算机指令,当所述计算机指令在被处理器运行时,使得权利要求1至10中任一项所述的方法被执行;或者,使得权利要求11至20中任一项所述的方法被执行。
- 一种电子设备,其特征在于,包括:至少一个处理器;以及与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的计算机指令,所述计算机指令被所述至少一个处理器执行,使得权利要求1至10中任一项所述的方法被执行;或者,使得权利要求11至20中任一项所述的方法被执行。
- 一种计算机程序产品,其特征在于,当所述计算机程序产品在处理器上运行时,使得权利要求1至10中任一项所述的方法被执行;或者,使得权利要求11至20中任一项所述的方法被执行。
- 一种芯片,其特征在于,包括:输入接口,用于获取第二历史对话数据、当前的对话环境数据、用户输入的第二输入对话数据;逻辑电路,用于执行如权利要求1至10中任一项所述的方法,得到与所述第二输入对话数据对应的第二反馈对话数据;输出接口,用于输出所述第二反馈对话数据。
- 一种终端设备,其特征在于,包括:数据采集装置,用于获取第二历史对话数据、当前的对话环境数据、用户输入的第二输入对话数据;对话系统,用于根据所述第二历史对话数据、所述当前的对话环境数据、所述第二输入对话数据确定第二语义关键字,并根据所述第二语义关键字和预设类型知识库,生成并输出与所述第二输入对话数据对应的第二反馈对话数据,其中,所述预设类型知识库包括:知识图谱、文本知识、以及除所述知识图谱和文本知识之外的外部知识。
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP23881618.5A EP4571530A4 (en) | 2022-10-28 | 2023-10-08 | MAN-MACHINE DIALOGUE METHOD, DIALOGUE NETWORK MODEL LEARNING METHOD, AND DEVICE |
| US19/190,648 US20250252267A1 (en) | 2022-10-28 | 2025-04-27 | Human-computer dialogue method, dialogue network model training method, and apparatus |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202211335469.1A CN117992579A (zh) | 2022-10-28 | 2022-10-28 | 人机对话方法、对话网络模型的训练方法及装置 |
| CN202211335469.1 | 2022-10-28 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US19/190,648 Continuation US20250252267A1 (en) | 2022-10-28 | 2025-04-27 | Human-computer dialogue method, dialogue network model training method, and apparatus |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024088039A1 true WO2024088039A1 (zh) | 2024-05-02 |
Family
ID=90830001
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2023/123430 Ceased WO2024088039A1 (zh) | 2022-10-28 | 2023-10-08 | 人机对话方法、对话网络模型的训练方法及装置 |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US20250252267A1 (zh) |
| EP (1) | EP4571530A4 (zh) |
| CN (1) | CN117992579A (zh) |
| WO (1) | WO2024088039A1 (zh) |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118246460A (zh) * | 2024-05-29 | 2024-06-25 | 北京中关村科金技术有限公司 | 面向智能客服的ai对话语义识别方法及系统 |
| CN118427333A (zh) * | 2024-07-04 | 2024-08-02 | 国网湖北省电力有限公司信息通信公司 | 一种智能语音客服实现方法和装置 |
| CN119646145A (zh) * | 2024-11-22 | 2025-03-18 | 清华大学 | 对话分析方法、装置、眼镜、设备及可读存储介质 |
| CN120408414A (zh) * | 2025-06-23 | 2025-08-01 | 科大讯飞股份有限公司 | 安全回复生成方法及相关装置、设备和存储介质 |
| CN120471160A (zh) * | 2025-07-15 | 2025-08-12 | 珠海汇金科技股份有限公司 | 基于数字人交互数据分析的政策知识图谱构建方法及系统 |
Families Citing this family (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN116843795B (zh) * | 2023-07-03 | 2025-02-07 | 北京百度网讯科技有限公司 | 图像生成方法及装置、电子设备和存储介质 |
| CN118093830B (zh) * | 2024-04-03 | 2025-11-04 | 北京世纪好未来教育科技有限公司 | 基于大语言模型的问题答案生成方法、装置、设备及介质 |
| CN119417117B (zh) * | 2024-10-17 | 2025-11-04 | 鹏城实验室 | 异质机器人的控制方法、装置、电子设备以及存储介质 |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20120005148A1 (en) * | 2010-06-30 | 2012-01-05 | Microsoft Corporation | Integrating specialized knowledge sources into a general search service |
| CN106919655A (zh) * | 2017-01-24 | 2017-07-04 | 网易(杭州)网络有限公司 | 一种答案提供方法和装置 |
| CN107016046A (zh) * | 2017-02-20 | 2017-08-04 | 北京光年无限科技有限公司 | 基于视觉场景化的智能机器人对话方法及系统 |
| CN109582798A (zh) * | 2017-09-29 | 2019-04-05 | 阿里巴巴集团控股有限公司 | 自动问答方法、系统及设备 |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2019011824A1 (en) * | 2017-07-11 | 2019-01-17 | Koninklijke Philips N.V. | MULTIMODAL DIALOGUE AGENT |
-
2022
- 2022-10-28 CN CN202211335469.1A patent/CN117992579A/zh active Pending
-
2023
- 2023-10-08 EP EP23881618.5A patent/EP4571530A4/en active Pending
- 2023-10-08 WO PCT/CN2023/123430 patent/WO2024088039A1/zh not_active Ceased
-
2025
- 2025-04-27 US US19/190,648 patent/US20250252267A1/en active Pending
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20120005148A1 (en) * | 2010-06-30 | 2012-01-05 | Microsoft Corporation | Integrating specialized knowledge sources into a general search service |
| CN106919655A (zh) * | 2017-01-24 | 2017-07-04 | 网易(杭州)网络有限公司 | 一种答案提供方法和装置 |
| CN107016046A (zh) * | 2017-02-20 | 2017-08-04 | 北京光年无限科技有限公司 | 基于视觉场景化的智能机器人对话方法及系统 |
| CN109582798A (zh) * | 2017-09-29 | 2019-04-05 | 阿里巴巴集团控股有限公司 | 自动问答方法、系统及设备 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4571530A4 |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118246460A (zh) * | 2024-05-29 | 2024-06-25 | 北京中关村科金技术有限公司 | 面向智能客服的ai对话语义识别方法及系统 |
| CN118427333A (zh) * | 2024-07-04 | 2024-08-02 | 国网湖北省电力有限公司信息通信公司 | 一种智能语音客服实现方法和装置 |
| CN119646145A (zh) * | 2024-11-22 | 2025-03-18 | 清华大学 | 对话分析方法、装置、眼镜、设备及可读存储介质 |
| CN120408414A (zh) * | 2025-06-23 | 2025-08-01 | 科大讯飞股份有限公司 | 安全回复生成方法及相关装置、设备和存储介质 |
| CN120471160A (zh) * | 2025-07-15 | 2025-08-12 | 珠海汇金科技股份有限公司 | 基于数字人交互数据分析的政策知识图谱构建方法及系统 |
Also Published As
| Publication number | Publication date |
|---|---|
| EP4571530A1 (en) | 2025-06-18 |
| EP4571530A4 (en) | 2025-11-12 |
| US20250252267A1 (en) | 2025-08-07 |
| CN117992579A (zh) | 2024-05-07 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2024088039A1 (zh) | 人机对话方法、对话网络模型的训练方法及装置 | |
| CN105701254B (zh) | 一种信息处理方法和装置、一种用于信息处理的装置 | |
| CN111368541B (zh) | 命名实体识别方法及装置 | |
| CN113705210B (zh) | 一种文章大纲生成方法、装置和用于生成文章大纲的装置 | |
| CN110781305A (zh) | 基于分类模型的文本分类方法及装置,以及模型训练方法 | |
| CN110020010A (zh) | 数据处理方法、装置及电子设备 | |
| CN117520498A (zh) | 基于虚拟数字人交互处理方法、系统、终端、设备及介质 | |
| CN113140138A (zh) | 互动教学方法、装置、存储介质及电子设备 | |
| CN116127062A (zh) | 预训练语言模型的训练方法、文本情感分类方法及装置 | |
| CN114242047B (zh) | 一种语音处理方法、装置、电子设备及存储介质 | |
| CN105469104A (zh) | 文本信息相似度的计算方法、装置及服务器 | |
| CN115879440A (zh) | 自然语言处理、模型训练方法、装置、设备及存储介质 | |
| CN115086279A (zh) | 一种多媒体内容的推送方法、装置、电子设备和存储介质 | |
| CN119961427A (zh) | 一种基于检索增强生成知识库的知识检索召回方法及装置 | |
| CN116955526A (zh) | 信息生成方法、装置、电子设备及可读存储介质 | |
| CN116975016A (zh) | 一种数据处理方法、装置、设备以及可读存储介质 | |
| US11314793B2 (en) | Query processing | |
| CN103995844B (zh) | 信息搜索方法和装置 | |
| CN111125305A (zh) | 热门话题确定方法、装置、存储介质及电子设备 | |
| WO2025092911A1 (zh) | 视频特征提取方法、视频生成方法、装置、介质及设备 | |
| CN115409200B (zh) | 数据库的操作方法、装置及介质 | |
| CN118277587B (zh) | 知识图谱补全方法、装置、电子设备以及存储介质 | |
| CN114443814A (zh) | 回复信息输出方法、装置、设备及存储介质 | |
| CN119783679B (zh) | 基于用户需求的文本数据的生成方法、装置、设备及介质 | |
| CN116340490B (zh) | 基于人机对话的画像构建方法、电子设备及存储介质 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23881618 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2023881618 Country of ref document: EP |
|
| ENP | Entry into the national phase |
Ref document number: 2023881618 Country of ref document: EP Effective date: 20250311 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| WWP | Wipo information: published in national office |
Ref document number: 2023881618 Country of ref document: EP |