CN114817455A - Model construction method, device, equipment and medium - Google Patents

Model construction method, device, equipment and medium Download PDF

Info

Publication number
CN114817455A
CN114817455A CN202210229151.9A CN202210229151A CN114817455A CN 114817455 A CN114817455 A CN 114817455A CN 202210229151 A CN202210229151 A CN 202210229151A CN 114817455 A CN114817455 A CN 114817455A
Authority
CN
China
Prior art keywords
clustering
model
corpora
corpus
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210229151.9A
Other languages
Chinese (zh)
Other versions
CN114817455B (en
Inventor
赵高枫
文俊杰
李金龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Merchants Bank Co Ltd
Original Assignee
China Merchants Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Merchants Bank Co Ltd filed Critical China Merchants Bank Co Ltd
Priority to CN202210229151.9A priority Critical patent/CN114817455B/en
Publication of CN114817455A publication Critical patent/CN114817455A/en
Application granted granted Critical
Publication of CN114817455B publication Critical patent/CN114817455B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/231Hierarchical techniques, i.e. dividing or merging pattern sets so as to obtain a dendrogram

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明涉及人工智能技术领域,公开了一种模型构建方法、装置、设备与介质。本发明通过获取构建模型的训练语料;基于预先训练好的聚类模型,对所述训练语料进行聚类处理,得到对应的聚类结果,其中,聚类结果包括聚类标签以及聚类标签对应的聚类语料;基于所述聚类结果中的聚类标签以及对应的聚类语料进行模型训练和预测,根据模型训练和预测结果确定目标意图识别模型。上述自动生成目标意图识别模型的方法,减少了在熟悉业务点以及数据标注过程中投入的时间,加速梳理业务点以及标注业务语料,提高了构建目标意图识别模型的效率,降低了人力成本。

Figure 202210229151

The invention relates to the technical field of artificial intelligence, and discloses a model construction method, device, equipment and medium. The present invention obtains the training corpus for constructing the model; based on the pre-trained clustering model, the training corpus is clustered to obtain the corresponding clustering result, wherein the clustering result includes the clustering label and the corresponding clustering label The clustering corpus; model training and prediction are performed based on the clustering labels in the clustering result and the corresponding clustering corpus, and the target intent recognition model is determined according to the model training and prediction results. The above-mentioned method for automatically generating a target intent recognition model reduces the time invested in the process of familiarizing business points and data labeling, accelerates sorting of business points and labeling business corpus, improves the efficiency of building a target intent recognition model, and reduces labor costs.

Figure 202210229151

Description

Model construction method, device, equipment and medium
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a model construction method, a model construction device, model construction equipment and a model construction medium.
Background
With the continuous development of artificial intelligence technology, the application of dialog systems is more and more extensive, for example, in an unmanned customer service system, intent recognition is an important component of the unmanned customer service system, a common algorithm of the intent recognition is to recognize the intent of a user through text classification, specifically, the intent of the user is divided into several categories, and corresponding response schemes are provided under the categories.
When a dialogue system is built, text classification is usually the simplest and most effective means for building a user intention identification model, the existing text classification model for intention identification is mainly made based on services needing to be undertaken in the dialogue system, operators are often required to know service points needing to be undertaken by the dialogue system in the making process, relevant service point information is combed out, then a large number of corpora provided by a service party are labeled, the classification is adjusted by means of a classification adjusting tool after the labeling is finished, tool expansion is considered if the corpora are insufficient, and finally the text classification model for identifying the user intention is obtained.
Disclosure of Invention
The invention mainly aims to provide a model construction method, a model construction device, model construction equipment and a model construction medium, and aims to reduce the labor cost of manual carding and labeling and improve the construction efficiency of classification models.
In order to achieve the above object, the present invention provides a model construction method, including the steps of:
obtaining a training corpus of a constructed model;
clustering the training corpora based on a pre-trained clustering model to obtain corresponding clustering results, wherein the clustering results comprise clustering labels and clustering corpora corresponding to the clustering labels;
and performing model training and prediction based on the clustering labels in the clustering result and the corresponding clustering corpora, and determining a target intention recognition model according to the model training and prediction result.
Preferably, the clustering process is performed on the training corpus based on a pre-trained clustering model, and the step of obtaining a corresponding clustering result includes:
sequentially disordering and dividing the training corpus to obtain a clustered sample corpus;
and clustering the clustering sample corpus based on a hierarchical clustering algorithm HAC to obtain clustering labels and clustering corpuses corresponding to the clustering labels.
Preferably, the clustering process of the clustered sample corpus based on the hierarchical clustering algorithm HAC to obtain the clustering labels and the clustered corpuses corresponding to the clustering labels includes:
classifying the clustered sample corpus, and dividing the classified clustered sample corpus of the same kind into a cluster to obtain clustering labels corresponding to clusters of different kinds;
determining the clustered corpora corresponding to different kinds of clustering labels based on the clustering labels corresponding to the clusters;
if the number of the corpora in the cluster is greater than a preset threshold value N1, taking the cluster as a clustering label and the corpora in the cluster corresponding to the clustering label as corresponding clustering corpora;
if the number of the corpora in the cluster is not greater than a preset threshold value N1, the corpora in the cluster use the other as a clustering label, and the corpora in the cluster corresponding to the clustering label are corresponding clustering corpora.
Preferably, the step of performing model training and prediction based on the clustering labels in the clustering result and the corresponding clustering corpora, and determining the target intention recognition model according to the model training and prediction result includes:
dividing clustering linguistic data in a clustering result into training linguistic data and prediction linguistic data according to a clustering label;
performing model training based on the training corpus to obtain a trained initial classification model;
inputting the prediction corpus into the trained initial classification model for prediction to obtain a prediction score value;
determining an accurate recall ratio (PRF) value of the clustering result based on the clustering label and the prediction score value in the clustering result;
based on the PRF values, a corresponding target intent recognition model is determined.
Preferably, the determining whether the clustering result is reasonable is based on the PRF value;
judging whether the PRF value reaches a preset threshold value;
if the PRF value reaches a preset threshold value, the clustering result is reasonable, and the initial classification model is output as a target intention identification model;
if the PRF value does not reach a preset threshold value, the clustering result is unreasonable, and the clustering result is classified and adjusted to obtain a classified and adjusted clustering result;
taking the clustering result after the classification adjustment as the current clustering result;
dividing clustering linguistic data in a clustering result into training linguistic data and prediction linguistic data according to a clustering label;
performing model training based on the training corpus to obtain a trained initial classification model;
inputting the prediction corpus into the trained initial classification model for prediction to obtain a prediction score value;
determining an accurate recall ratio (PRF) value of the clustering result based on the clustering label and the prediction score value in the clustering result;
and until the PRF value reaches a preset threshold value, the clustering result is reasonable, and the initial classification model is output as a target intention identification model.
Preferably, the clustering result comprises other clustering corpora with clustering labels being other and non-other clustering corpora with clustering labels being non-other,
the step of performing classification adjustment on the clustering result to obtain a clustering result after the classification adjustment comprises the following steps:
adjusting the other clustering linguistic data and the non-other clustering linguistic data to obtain clustering linguistic data corresponding to the adjusted clustering labels;
calculating the confusion degree of the clustering corpus corresponding to the adjusted clustering label;
and when the confusion degree is greater than a preset threshold value T2, merging the non-other clustering corpora before and after the adjustment and the adjustment as the non-other clustering corpora of the current clustering result, and taking other corpora as other clustering corpora with the other labels as the other clustering corpora.
Preferably, the step of adjusting the other clustering corpus and the non-other clustering corpus to obtain the clustering corpus corresponding to the adjusted clustering label includes:
acquiring a prediction score value of the non-other clustering corpus;
if the predicted score value of the non-other clustering corpus is lower than a preset threshold value T1, changing the clustering label of the non-other clustering corpus into other;
and when the number of the other clustering corpuses with the clustering labels being other exceeds a preset threshold value N2, acquiring the adjusted other clustering corpuses and the adjusted non-other clustering corpuses.
Preferably, the step of obtaining the corpus of the constructed model includes:
acquiring an original corpus from a service end;
preprocessing the original corpus to obtain a training corpus for model construction;
the preprocessing mode comprises one or more of eliminating stop words, full intersection half angles, eliminating emoticons, eliminating calling words and nonsense problems, uniformly using punctuation marks and eliminating non-used punctuation marks.
Further, to achieve the above object, the present invention also provides a model building apparatus including:
the acquisition module is used for acquiring training corpora of the constructed model;
the clustering module is used for clustering the training corpora based on a pre-trained clustering model to obtain corresponding clustering results, wherein the clustering results comprise clustering labels and clustering corpora corresponding to the clustering labels;
and the determining module is used for carrying out model training and prediction based on the clustering labels in the clustering result and the corresponding clustering corpora, and determining a target intention recognition model according to the model training and prediction result.
Preferably, the obtaining module is further configured to:
acquiring an original corpus from a service end;
preprocessing the original corpus to obtain a training corpus for model construction;
the preprocessing mode comprises one or more of eliminating stop words, full intersection half angles, eliminating emoticons, eliminating calling words and nonsense problems, uniformly using punctuation marks and eliminating non-used punctuation marks.
Preferably, the clustering module is further configured to:
sequentially disordering and dividing the training corpus to obtain a clustered sample corpus;
and clustering the clustering sample corpus based on a hierarchical clustering algorithm HAC to obtain clustering labels and clustering corpuses corresponding to the clustering labels.
Preferably, the clustering module is further configured to:
classifying the clustered sample corpus, and dividing the classified clustered sample corpus of the same kind into a cluster to obtain clustering labels corresponding to clusters of different kinds;
determining the clustered corpora corresponding to different kinds of clustering labels based on the clustering labels corresponding to the clusters;
if the number of the corpora in the cluster is greater than a preset threshold value N1, taking the cluster as a clustering label and the corpora in the cluster corresponding to the clustering label as corresponding clustering corpora;
if the number of the corpora in the cluster is not greater than a preset threshold value N1, the corpora in the cluster use the other as a clustering label, and the corpora in the cluster corresponding to the clustering label are corresponding clustering corpora.
Preferably, the determining module is further configured to:
dividing clustering linguistic data in a clustering result into training linguistic data and prediction linguistic data according to a clustering label;
performing model training based on the training corpus to obtain a trained initial classification model;
inputting the prediction corpus into the trained initial classification model for prediction to obtain a prediction score value;
determining an accurate recall ratio (PRF) value of the clustering result based on the clustering label and the prediction score value in the clustering result;
based on the PRF values, a corresponding target intent recognition model is determined.
Preferably, the determining module is further configured to:
judging whether the PRF value reaches a preset threshold value;
if the PRF value reaches a preset threshold value, the clustering result is reasonable, and the initial classification model is output as a target intention identification model;
if the PRF value does not reach the preset threshold value, the clustering result is unreasonable, and the clustering result is classified and adjusted to obtain the classified and adjusted clustering result;
taking the clustering result after the classification adjustment as the current clustering result, and executing the steps:
dividing the clustering linguistic data in the clustering result after classification adjustment into training linguistic data and prediction linguistic data according to the clustering label;
performing model training based on the training corpus to obtain a trained initial classification model;
inputting the prediction corpus into the trained initial classification model for prediction to obtain a prediction score value;
determining an accurate recall ratio (PRF) value of the clustering result based on the clustering label and the prediction score value in the clustering result;
and until the PRF value reaches a preset threshold value, the clustering result is reasonable, and the initial classification model is output as a target intention recognition model.
Preferably, the determining module is further configured to:
adjusting the other clustering linguistic data and the non-other clustering linguistic data to obtain clustering linguistic data corresponding to the adjusted clustering labels;
calculating the confusion degree of the clustering corpus corresponding to the adjusted clustering label;
and when the confusion degree is greater than a preset threshold value T2, merging the non-other clustering corpora before and after the adjustment and the adjustment as the non-other clustering corpora of the current clustering result, and taking other corpora as other clustering corpora with the other labels as the other clustering corpora.
Preferably, the determining module is further configured to:
acquiring a prediction score value of the non-other clustering corpus;
if the predicted score value of the non-other clustering corpus is lower than a preset threshold value T1, changing the clustering label of the non-other clustering corpus into other;
and when the number of the other clustering corpuses with the clustering labels being other exceeds a preset threshold value N2, acquiring the adjusted other clustering corpuses and the adjusted non-other clustering corpuses.
Further, to achieve the above object, the present invention also provides a model building apparatus including: a memory, a processor and a model building program stored on the memory and executable on the processor, the model building program when executed by the processor implementing the steps of the model building method as described above.
Further, to achieve the above object, the present invention also provides a medium which is a computer-readable storage medium having stored thereon a model construction program which, when executed by a processor, implements the steps of the model construction method as described above.
The model building method, the device, the equipment and the medium provided by the invention are characterized in that training corpora for building the model are obtained; clustering the training corpora based on a pre-trained clustering model to obtain corresponding clustering results, wherein the clustering results comprise clustering labels and clustering corpora corresponding to the clustering labels; and performing model training and prediction based on the clustering labels in the clustering result and the corresponding clustering corpora, and determining a target intention recognition model according to the model training and prediction result.
The clustering processing method for the automatic generation of the target intention recognition model has the advantages that training corpora used for building the intention recognition model are clustered, clustering results corresponding to training expectations are obtained, the clustering results comprise clustering labels which are classified by the clustering corpora and clustering corpora corresponding to the clustering labels, model training and prediction are conducted on the clustering results comprising the clustering labels and the clustering corpora corresponding to the clustering labels, PRF values corresponding to the clustering results are obtained, the target intention recognition model is determined according to the PRF values, the time invested in the process of knowing service points and data labeling is reduced, the efficiency of combing the service points and labeling the service corpora is improved, and the labor cost is reduced.
Drawings
FIG. 1 is a schematic diagram of an apparatus architecture of a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a first embodiment of the model construction method of the present invention;
FIG. 3 is a schematic flow chart of a first embodiment of the model construction method according to the present invention;
FIG. 4 is a schematic flow chart of a second embodiment of the model construction method of the present invention;
FIG. 5 is a schematic sub-flowchart of step S22 in the second embodiment of the model construction method according to the present invention;
FIG. 6 is a schematic flow chart of a third embodiment of the model construction method of the present invention;
FIG. 7 is a schematic flow chart of a fourth embodiment of the model construction method of the present invention;
FIG. 8 is a schematic sub-flow chart of step B3 in the fourth embodiment of the model construction method according to the present invention;
fig. 9 is a functional block diagram of a model building apparatus according to a first embodiment of the model building method of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1, fig. 1 is a schematic device structure diagram of a hardware operating environment according to an embodiment of the present invention.
The device of the embodiment of the invention can be a mobile terminal or a server device.
As shown in fig. 1, the apparatus may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. The communication bus 1002 is used to implement connection communication among these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.
Those skilled in the art will appreciate that the configuration of the apparatus shown in fig. 1 is not intended to be limiting of the apparatus and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
As shown in fig. 1, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a model building program.
The operating system is a program for managing and controlling the model building equipment and software resources, and supports the operation of a network communication module, a user interface module, a model building program and other programs or software; the network communication module is used for managing and controlling the network interface 1002; the user interface module is used to manage and control the user interface 1003.
In the model building apparatus shown in fig. 1, the model building apparatus calls a model building program stored in a memory 1005 by a processor 1001 and performs operations in various embodiments of the model building method described below.
Based on the hardware structure, the embodiment of the model construction method is provided.
Referring to fig. 2, fig. 2 is a schematic flow chart of a first embodiment of the model building method of the present invention, and the method includes:
step S10, obtaining a training corpus of a constructed model;
acquiring an original corpus from a service end; preprocessing the original corpus to obtain a training corpus for model construction; the preprocessing mode comprises one or more of eliminating stop words, full intersection half angles, eliminating emoticons, eliminating calling words and nonsense problems, uniformly using punctuation marks and eliminating non-used punctuation marks.
In a specific embodiment, conversation corpora from a service scene are collected for model construction, a large number of user chat records of an intelligent customer service are used as original corpora of a training target model, the original corpora are subjected to standardized preprocessing in a mode of one or more of eliminating stop words, full turn-over half corners, eliminating emoticons, eliminating solicited words and nonsense problems, unifying punctuation marks and eliminating non-punctuation marks, and the effects of reducing noise in the original corpora, improving the purity of the original corpora and acquiring the training corpora capable of being used for model construction are achieved by carrying out standardized preprocessing on the whole original corpora from a client.
Step S20, clustering the training corpora based on a pre-trained clustering model to obtain corresponding clustering results, wherein the clustering results comprise clustering labels and clustering corpora corresponding to the clustering labels;
the principle of clustering the training corpuses to obtain the clustering result corresponding to the training corpuses is as follows: the clustering process can cluster and merge adjacent similar classified regions by using morphological operators, and after the clustering and merging process, clusters with different categories are generated, wherein the clusters generated by clustering are a set of data objects, and the objects are similar to objects in the same cluster and different from objects in other clusters.
In the existing clustering processing algorithms, most of the clustering algorithms have the capacity of processing noise data, and some clustering algorithms are very sensitive to the noise data and can obtain corresponding accurate clustering results after the audio data are clustered.
In a specific embodiment, the corpus data collected from the service end is preprocessed to obtain preprocessed corpus, the preprocessed corpus is input into a pre-trained clustering model, the preprocessed corpus is disorderly in sequence, and part of the corpus in the training language is extracted for clustering, wherein the clustering algorithm can adopt a hierarchical clustering algorithm HAC to obtain clusters of different categories, and the clusters have corpuses corresponding to the clusters, so as to obtain a clustering result comprising clustering labels and clustering corpuses corresponding to the clustering labels, in the clustering result, if the number of corpuses in a cluster is larger than a preset threshold N1 (example: 50), the cluster id is used as the clustering label, and the rest corpuses the other as the clustering label.
And step S30, performing model training and prediction based on the clustering labels in the clustering result and the corresponding clustering corpora, and determining a target intention recognition model according to the model training and prediction result.
In the prior art, an initial pre-training model is used for dividing and labeling the manual work to obtain a data set capable of performing model training, and deep learning is performed according to the initial pre-training model of the data set to obtain a classification model of a target. In the model building method of this embodiment, the training corpuses of the service end are directly obtained, clustering processing is performed on the training corpuses to obtain clustering results including clustering labels and clustering corpuses corresponding to the clustering labels, the clustering results are trained and predicted, and a target intention recognition model is determined according to model training and prediction results.
In a specific embodiment, according to the clustering result including the clustering label and the clustering corpora corresponding to the clustering label, the clustering result is divided into 5 parts according to the clustering label on average, for example, the number of the corpora with the clustering label (cluster id) of label1 is 200, the number of the clustering label of each part of the clustered corpora after the division is label1 is 40, 4 parts of the clustered corpora are taken each time to train the pre-trained classification model, and the remaining 1 part of the clustered corpora is predicted, so that the score value of the predicted value of all 5 parts of the clustered corpora can be obtained. And (3) obtaining a PRF value corresponding to the clustering result, namely Precision, Recall and F1 values (F1) through the clustering label and the corresponding prediction score value, evaluating whether the clustering result is reasonable according to the PRF value, and if the clustering result is reasonable, outputting a classification model trained on the basis of training data to obtain a corresponding target intention recognition model.
In this embodiment, clustering is performed on the preprocessed training corpora to obtain clustering labels with different feature categories and clustering corpora corresponding to the clustering labels, model training and prediction are performed on the clustering labels and the clustering corpora corresponding to the clustering labels as clustering results, and a target intention recognition model is determined according to the model training and prediction results. Through automatic establishment of the intention recognition model, the labor cost of manual marking is greatly reduced, the operation pressure of operators is reduced, in addition, the automatic establishment of the model method can assist the operators to understand the business, different business types related to carding are found, and the efficiency of establishment of the intention recognition model is improved.
Further, a second embodiment of the model construction method of the present invention is proposed based on the first embodiment of the model construction method of the present invention.
The difference between the second embodiment of the model building method and the first embodiment of the model building method is that, in this embodiment, for step S20, the clustering process is performed on the training corpora based on the pre-trained clustering model to obtain a refinement of the corresponding clustering result, referring to fig. 4, the step specifically includes:
step S21, orderly disordering and dividing the training corpora to obtain clustered sample corpora;
in a specific embodiment, the method for inputting the preprocessed corpus into a pre-trained clustering model, disordering the arrangement order of the corpus, extracting a part of the corpus for clustering, and extracting a part of the corpus for clustering specifically includes: and synchronously clustering partial training corpora, and then gradually adding the rest training corpora to obtain clustering results corresponding to the training corpora. And the mode of directly clustering all the training corpuses is compared, and the classification effect finally obtained by the mode of extracting part of the training corpuses for clustering is better.
In an embodiment, the preprocessed corpus is disordered, and a part of the corpus in the corpus is extracted as a clustering sample corpus, for example, 20% of the total corpus may be extracted as a clustering sample corpus for clustering with reference to the total corpus.
And step S22, clustering the clustered sample corpus based on a hierarchical clustering algorithm (HAC) to obtain a clustering label and a clustered corpus corresponding to the clustering label.
There are many ways of clustering, including a partitioning method, a hierarchical method, a density-based method, a grid-based method, a model-based method, a pass-through closed-packet method, a boolean matrix method, a direct clustering method, a correlation analysis clustering method, a statistical-based clustering method, etc., and in different methods, there are also various different clustering algorithms from which corresponding clustering results can be obtained.
In a specific embodiment, the clustering processing is performed on the clustered corpus sample based on a hierarchical clustering algorithm HAC, so as to obtain clustering labels that the clustered corpus sample belongs to different categories, the clustering labels are different clusters corresponding to each category, the clusters have corpora corresponding to each other, the clusters are called clustering labels, the corpora corresponding to each different cluster are used as clustered corpora corresponding to the clustering labels, and then, a clustering result including the clustering labels and the clustered corpora corresponding to the clustering labels after the clustering processing is obtained.
Referring to fig. 5, step S22 specifically includes:
step A1, classifying the clustered sample corpora, and dividing the clustered sample corpora of the same kind into a cluster after classification to obtain clustering labels corresponding to clusters of different kinds;
in a specific embodiment, the clustered sample corpora are classified to obtain clustered labels that belong to different categories after classification, and the clustered sample corpora are classified according to the clustered labels to obtain clustered labels and clustered corpora corresponding to the clustered labels.
Step A2, determining clustered corpora corresponding to different kinds of clustering labels based on the clustering labels corresponding to the clusters;
in a specific embodiment, clustering sample corpora are classified based on a hierarchical clustering algorithm HAC to obtain different clusters corresponding to each category, the clusters have corpora corresponding to each category, the clusters are called clustering labels, and the corpora corresponding to each different cluster are used as clustering corpora corresponding to the clustering labels.
Step A3, if the number of the corpora in the cluster is greater than a preset threshold value N1, the corpora in the cluster use the cluster as a clustering label, and the corpora in the cluster corresponding to the clustering label are corresponding clustering corpora;
step A4, if the number of the corpora in the cluster is not greater than a preset threshold N1, the corpora in the cluster use the other as a clustering label, and the corpora in the cluster corresponding to the clustering label are corresponding clustering corpora.
In a specific embodiment, the clustered sample corpora may be divided into five different categories, namely, label1, label2, label3, label4 and label5, and the clustering labels of the five different categories have corresponding clustered corpora, if the clusters in the clustering result, of which the corpus number is greater than the preset threshold 50, include label1, label2, label3 and label4, the clustering labels are the clustering labels of the clustered corpora of label1, label2, label3 and label4, and the clustering labels of label1, label2, label3 and label4 are clustering labels, and the clustering corpora corresponding to the remaining clustered sample corpus label5 uses the other label.
In this embodiment, the training corpora are disordered and divided to obtain clustered sample corpora for clustering, and other clustered non-sample corpora are also reserved, and the clustered sample corpora are clustered based on a hierarchical clustering algorithm HAC to obtain clustered results after clustering, wherein the clustered results include clustering labels and clustered corpora corresponding to the clustering labels.
Further, a third embodiment of the model construction method of the present invention is proposed based on the first and second embodiments of the model construction method of the present invention.
The third embodiment of the model building method is different from the first and second embodiments of the model building method in that, in this embodiment, step S30 is performed, the model training and prediction are performed based on the clustering labels in the clustering result and the corresponding clustering corpora, and the refinement of the target intention recognition model is determined according to the model training and prediction result, with reference to fig. 6, the step specifically includes:
step S31, dividing the clustering linguistic data in the clustering result into training linguistic data and prediction linguistic data according to the clustering label;
and averagely dividing the clustering linguistic data in the clustering result into n parts according to the clustering label, taking m parts of clustering linguistic data in the n parts of clustering linguistic data each time to train the training initial model to obtain a corresponding classification model, predicting the remaining clustering linguistic data in the n parts of clustering linguistic data, and further obtaining the predicted value scores of all clustering linguistic data.
In a specific embodiment, the clustered corpus in the clustering result is averagely partitioned into 5 parts according to the clustering labels label1 and label2, when the number of the clustered corpus is 200, the number of labels label1 contained in each partitioned clustered corpus is 40, 4 parts of the clustered corpus are taken for training each time, and the rest clustered corpus is predicted, so that the prediction score value of all clustered corpuses with the clustering labels label1 can be obtained; in the clustering corpus with the clustering label of label2, when the number of the clustering corpus is 400, after segmentation, the number of labels of label2 in each clustering corpus is 80, 4 clustering corpuses are taken each time for training, and the rest is predicted, so that the prediction score value of all clustering corpuses with the clustering label of label2 can be obtained.
Step S32, performing model training based on the training corpus to obtain a trained initial classification model;
step S33, inputting the prediction corpus into the trained initial classification model for prediction to obtain a prediction score value;
in a specific embodiment, the training corpus in the partitioned clustering corpus is used for training a pre-training classification model to obtain a corresponding classification model, the prediction corpus in the partitioned clustering corpus is input into the classification model to obtain a corresponding classification result after classification, and the classification result is subjected to prediction testing to obtain a prediction score value of the classification result corresponding to the classification model.
Step S34, based on the clustering label in the clustering result and the prediction score value, determining the accurate recall ratio PRF value of the clustering result;
and obtaining a clustering label in the clustering result and a Precision (Precision rate), a Recall (Recall rate) and an F1 (F1) value which are corresponding to the clustering result, namely a PRF value, through a prediction score value corresponding to the classifying result of the classification model, evaluating whether the classification model is reasonable according to the PRF value, further evaluating whether the training data of the classification model is reasonable, and further obtaining whether the clustering result is reasonable.
Step S35, determining a corresponding target intention recognition model based on the PRF value.
In a specific embodiment, according to the PRF value of the clustering result, the output corresponding target intention recognition model may be determined, and specifically, the value determination rule for the PRF value is as follows:
if the PRF value reaches a preset threshold value and the clustering result is reasonable, training the pre-training model by adopting all corpora in the clustering result to obtain a corresponding classification model;
if the PRF value does not reach the preset threshold value and the clustering result is unreasonable, the clustering labels corresponding to the clustering result and the clustering corpora corresponding to the clustering labels need to be classified and adjusted to obtain the classified and adjusted clustering labels and the clustering corpora corresponding to the clustering labels, model training and prediction are carried out on the classified and adjusted clustering labels and the clustering results of the clustering corpora corresponding to the clustering labels again until the PRF value corresponding to the clustering result reaches the preset threshold value, and a corresponding classification model is output.
In this embodiment, the clustering result after the clustering process is trained and predicted to obtain the classification output result of the classification model corresponding to the clustering result and the prediction score value corresponding to the clustering result, the PRF value of the clustering result is obtained according to the classification output result and the prediction score value, whether the clustering result is reasonable is determined according to whether the PRF value reaches a preset threshold, and the corresponding target intention recognition model is finally determined, so that the accuracy of automatically creating the target intention recognition model can be improved, a fault-tolerant mechanism is provided, the classification result of the classification model is predicted and determined, and the accuracy of the classification result of the classification model is improved.
Further, a fourth embodiment of the model construction method of the present invention is proposed based on the first, second, and third embodiments of the model construction method of the present invention.
The fourth embodiment of the model construction method differs from the first, second, and third embodiments of the model construction method in that the present embodiment is a refinement of determining a corresponding target intention recognition model based on the PRF value in step S35, and with reference to fig. 7, the step specifically includes:
step B1, judging whether the PRF value reaches a preset threshold value;
step B2, if the PRF value reaches a preset threshold value, the clustering result is reasonable, and the initial classification model is output as a target intention identification model;
in a specific embodiment, if the PRF value of the classification result output by the classification model reaches a preset threshold, it is determined whether all the clustered corpus are added to the process of training the pre-training model, if all the clustered corpus are added, the classification model is directly output as the target intention recognition model, if not all the clustered corpus are added, the corpus are added to further train the classification model until all the corpus are added to the training, and the classification model is output as the target intention recognition model.
Step B3, if the PRF value does not reach a preset threshold value, the clustering result is unreasonable, and the clustering result is classified and adjusted to obtain a classified and adjusted clustering result;
referring to fig. 8, step B3 specifically includes:
the clustering result comprises other clustering linguistic data with a clustering label of other and non-other clustering linguistic data with a clustering label of non-other, and the step of classifying and adjusting the clustering result to obtain the classified and adjusted clustering result comprises the following steps:
b1, adjusting the other clustered linguistic data and the non-other clustered linguistic data to obtain the clustered linguistic data corresponding to the adjusted clustered labels;
step b2, calculating the confusion degree of the clustering corpus corresponding to the adjusted clustering label;
step b3, when the confusion degree is larger than the preset threshold value T2, merging the non-other clustered corpora before and after the adjustment of the classification as the non-other clustered corpora of the current clustering result, and using other corpora as the other clustered corpora with the other labels as the other clustered corpora.
In a specific embodiment, the step of performing classification adjustment on the clustering result includes three steps:
firstly, the first part is to change the label of the part which is not credible in the non-other clustering corpus to the clustering corpus of which the clustering label is other, and the clustering labels and the prediction score values of all the clustering corpuses are obtained through the steps. And (4) exchanging the inconsistent clustering linguistic data with the prediction value lower than a certain threshold value of 0.3 and the prediction value and the clustering linguistic data with the clustering label being non-other into the clustering linguistic data with the clustering label being other, changing the label to integrate the advantages of classification and clustering, and placing the non-credible part of the clustering linguistic data with the clustering label being non-other into the other.
Then, whether the corpus number of the cluster label other exceeds 10% of the threshold value overall training corpus is judged, if yes, the new clustering result is executed with the clustering algorithm, the cluster corpus of which the cluster corpus number exceeds a certain preset threshold value N2 in the clustering result is marked as a new cluster corpus, and the cluster corpus is added into the training corpus.
And finally, classifying and combining the new clustering corpora and the clustering corpora before classification adjustment, wherein the process of classification and combination needs to calculate the confusion degree of the clustering corpora before and after classification adjustment, and the formula for calculating the confusion degree is as follows:
Figure BDA0003536524750000151
wherein N is catei,catej The number representing the actual intent as catei but misclassified to catej; n is a radical of catei Representing the actual number of cateis;
Figure BDA0003536524750000152
representing the number of predicted cateis; and merging the two clustering corpora with the confusion degree larger than the threshold T2 (example: 0.25) and with the clustering labels being non-other clustering corpora to obtain the clustering result after the classification adjustment.
Step B4, taking the clustering result after the classification adjustment as the current clustering result;
obtaining a current clustering result, and executing the following steps:
dividing clustering linguistic data in a clustering result into training linguistic data and prediction linguistic data according to a clustering label;
performing model training based on the training corpus to obtain a trained initial classification model;
inputting the prediction corpus into the trained initial classification model for prediction to obtain a prediction score value;
determining an accurate recall PRF value of the clustering result based on the clustering label and the prediction score value in the clustering result;
in a specific embodiment, if the PRF value of the clustering result does not reach the preset threshold and the clustering result is not reasonable, the clustering label corresponding to the clustering result and the clustering corpus corresponding to the clustering label need to be classified and adjusted, after the classification-adjusted clustering result is obtained, the classification-adjusted clustering result needs to be used as the current clustering result, the steps of performing model training and prediction based on the clustering label in the clustering result and the corresponding clustering corpus are repeatedly performed, and determining the target intention recognition model according to the model training and prediction result.
Further, the clustering linguistic data in the clustering result are divided into training linguistic data and prediction according to the clustering label, the classification result of the classification model is obtained through the training linguistic data, the prediction score value of the clustering result is obtained through the prediction, the PRF value corresponding to the clustering result is determined based on the classification result and the prediction score value, and then the corresponding target intention recognition model is determined according to the PRF value.
And step B5, until the PRF value reaches a preset threshold value, the clustering result is reasonable, and the initial classification model is output as a target intention identification model.
And if the PRF value reaches a preset threshold value and the clustering result is reasonable, outputting a classification model obtained by training the clustering result as a target intention recognition model.
In this embodiment, whether the classification model is reasonable or not is determined by judging the PRF value of the classification result of the classification model, whether the training data of the classification model is reasonable or not is further determined, whether the clustering result including the training data is reasonable or not is further determined, if so, the classification model is directly output to obtain the target intention recognition model, if not, the clustering result needs to be classified and adjusted to obtain corresponding model training data, and then the target intention recognition model is obtained, so that the accuracy of automatically creating the target intention recognition model is improved, and the efficiency of automatically creating the target intention recognition model is improved.
The invention also provides a model construction device. Referring to fig. 9, the model building apparatus of the present invention includes:
an obtaining module 10, configured to obtain a corpus of a constructed model;
the clustering module 20 is configured to perform clustering processing on the training corpora based on a pre-trained clustering model to obtain corresponding clustering results, where the clustering results include clustering labels and clustering corpora corresponding to the clustering labels;
and the determining module 30 is configured to perform model training and prediction based on the clustering labels in the clustering result and the corresponding clustering corpora, and determine a target intention recognition model according to the model training and prediction result.
Furthermore, the present invention also provides a computer-readable storage medium, preferably a computer-readable storage medium, having stored thereon a model construction program, which when executed by a processor implements the steps of the model construction method as described above.
In the embodiments of the model building apparatus and medium of the present invention, all technical features of the embodiments of the model building method are included, and the descriptions and explanations are basically the same as those of the embodiments of the model building method, and are not repeated here.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (11)

1. A model construction method is characterized by comprising the following steps:
obtaining a training corpus of a constructed model;
clustering the training corpora based on a pre-trained clustering model to obtain corresponding clustering results, wherein the clustering results comprise clustering labels and clustering corpora corresponding to the clustering labels;
and performing model training and prediction based on the clustering labels in the clustering result and the corresponding clustering corpora, and determining a target intention recognition model according to the model training and prediction result.
2. The model building method according to claim 1, wherein the step of clustering the training corpus based on the pre-trained clustering model to obtain the corresponding clustering result comprises:
sequentially disordering and dividing the training corpus to obtain a clustered sample corpus;
and clustering the clustering sample corpus based on a hierarchical clustering algorithm HAC to obtain clustering labels and clustering corpuses corresponding to the clustering labels.
3. The model building method according to claim 2, wherein the step of clustering the clustered sample corpus based on the hierarchical clustering algorithm HAC to obtain the clustering labels and the clustering corpuses corresponding to the clustering labels comprises:
classifying the clustered sample corpus, and dividing the classified clustered sample corpus of the same kind into a cluster to obtain clustering labels corresponding to clusters of different kinds;
determining the clustered corpora corresponding to different kinds of clustering labels based on the clustering labels corresponding to the clusters;
if the number of the corpora in the cluster is greater than a preset threshold value N1, taking the cluster as a clustering label and the corpora in the cluster corresponding to the clustering label as corresponding clustering corpora;
if the number of the corpora in the cluster is not greater than a preset threshold value N1, the corpora in the cluster use the other as a clustering label, and the corpora in the cluster corresponding to the clustering label are corresponding clustering corpora.
4. The model building method according to claim 1, wherein the step of performing model training and prediction based on the clustering labels in the clustering result and the corresponding clustering corpora, and determining the target intention recognition model according to the model training and prediction result comprises:
dividing clustering linguistic data in a clustering result into training linguistic data and prediction linguistic data according to a clustering label;
performing model training based on the training corpus to obtain a trained initial classification model;
inputting the prediction corpus into the trained initial classification model for prediction to obtain a prediction score value;
determining an accurate recall ratio (PRF) value of the clustering result based on the clustering label and the prediction score value in the clustering result;
based on the PRF values, a corresponding target intent recognition model is determined.
5. The model building method of claim 4, wherein the step of determining a corresponding target intent recognition model based on the PRF value comprises:
judging whether the PRF value reaches a preset threshold value;
if the PRF value reaches a preset threshold value, the clustering result is reasonable, and the initial classification model is output as a target intention identification model;
if the PRF value does not reach a preset threshold value, the clustering result is unreasonable, and the clustering result is classified and adjusted to obtain a classified and adjusted clustering result;
taking the clustering result after the classification adjustment as the current clustering result, and returning to the execution step:
dividing clustering linguistic data in a clustering result into training linguistic data and prediction linguistic data according to a clustering label;
performing model training based on the training corpus to obtain a trained initial classification model;
inputting the prediction corpus into the trained initial classification model for prediction to obtain a prediction score value;
determining an accurate recall ratio (PRF) value of the clustering result based on the clustering label and the prediction score value in the clustering result;
and until the PRF value reaches a preset threshold value, the clustering result is reasonable, and the initial classification model is output as a target intention identification model.
6. The model building method of claim 5, wherein said clustering result includes other clustering corpora with clustering labels being other and non-other clustering corpora with clustering labels being non-other,
the step of performing classification adjustment on the clustering result to obtain a clustering result after the classification adjustment comprises the following steps:
adjusting the other clustering linguistic data and the non-other clustering linguistic data to obtain clustering linguistic data corresponding to the adjusted clustering labels;
calculating the confusion degree of the clustering corpus corresponding to the adjusted clustering label;
and when the confusion degree is greater than a preset threshold value T2, merging the non-other clustering corpora before and after the adjustment and the adjustment as the non-other clustering corpora of the current clustering result, and taking other corpora as other clustering corpora with the other labels as the other clustering corpora.
7. The model building method according to claim 6, wherein the step of adjusting the other clustered corpus and the non-other clustered corpus to obtain the clustered corpus corresponding to the adjusted clustering label comprises:
acquiring a prediction score value of the non-other clustering corpus;
if the predicted score value of the non-other clustering corpus is lower than a preset threshold value T1, changing the clustering label of the non-other clustering corpus into other;
and when the number of the other clustering corpuses with the clustering labels being other exceeds a preset threshold value N2, acquiring the adjusted other clustering corpuses and the adjusted non-other clustering corpuses.
8. The model building method according to claim 1, wherein the step of obtaining the corpus of the built model comprises:
acquiring an original corpus from a service end;
preprocessing the original corpus to obtain a training corpus for model construction;
the preprocessing mode comprises one or more of eliminating stop words, full intersection half angles, eliminating emoticons, eliminating calling words and nonsense problems, uniformly using punctuation marks and eliminating non-used punctuation marks.
9. A model building apparatus, characterized in that the model building apparatus comprises:
the acquisition module is used for acquiring training corpora of the constructed model;
the clustering module is used for clustering the training corpora based on a pre-trained clustering model to obtain corresponding clustering results, wherein the clustering results comprise clustering labels and clustering corpora corresponding to the clustering labels;
and the determining module is used for carrying out model training and prediction based on the clustering labels in the clustering result and the corresponding clustering corpora, and determining a target intention recognition model according to the model training and prediction result.
10. A model building apparatus, characterized in that the model building apparatus comprises: memory, a processor and a model building program stored on the memory and executable on the processor, the model building program when executed by the processor implementing the steps of the model building method according to any one of claims 1 to 8.
11. A medium which is a computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a model construction program which, when executed by a processor, implements the steps of the model construction method according to any one of claims 1 to 8.
CN202210229151.9A 2022-03-08 2022-03-08 Model building methods, devices, equipment and media Active CN114817455B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210229151.9A CN114817455B (en) 2022-03-08 2022-03-08 Model building methods, devices, equipment and media

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210229151.9A CN114817455B (en) 2022-03-08 2022-03-08 Model building methods, devices, equipment and media

Publications (2)

Publication Number Publication Date
CN114817455A true CN114817455A (en) 2022-07-29
CN114817455B CN114817455B (en) 2026-04-07

Family

ID=82528956

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210229151.9A Active CN114817455B (en) 2022-03-08 2022-03-08 Model building methods, devices, equipment and media

Country Status (1)

Country Link
CN (1) CN114817455B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116467602A (en) * 2023-04-27 2023-07-21 中国工商银行股份有限公司 Training data generation method, device, computer equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010250814A (en) * 2009-04-14 2010-11-04 Nec (China) Co Ltd Part-of-speech tagging system, training device and method of part-of-speech tagging model
CN109739984A (en) * 2018-12-25 2019-05-10 贵州商学院 A kind of parallel KNN network public-opinion sorting algorithm of improvement based on Hadoop platform
WO2021120588A1 (en) * 2020-06-17 2021-06-24 平安科技(深圳)有限公司 Method and apparatus for language generation, computer device, and storage medium
CN113191148A (en) * 2021-04-30 2021-07-30 西安理工大学 Rail transit entity identification method based on semi-supervised learning and clustering
CN113704479A (en) * 2021-10-26 2021-11-26 深圳市北科瑞声科技股份有限公司 Unsupervised text classification method and device, electronic equipment and storage medium
CN113704429A (en) * 2021-08-31 2021-11-26 平安普惠企业管理有限公司 Semi-supervised learning-based intention identification method, device, equipment and medium
CN114003720A (en) * 2021-10-29 2022-02-01 平安国际智慧城市科技股份有限公司 Business document classification method, device, equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010250814A (en) * 2009-04-14 2010-11-04 Nec (China) Co Ltd Part-of-speech tagging system, training device and method of part-of-speech tagging model
CN109739984A (en) * 2018-12-25 2019-05-10 贵州商学院 A kind of parallel KNN network public-opinion sorting algorithm of improvement based on Hadoop platform
WO2021120588A1 (en) * 2020-06-17 2021-06-24 平安科技(深圳)有限公司 Method and apparatus for language generation, computer device, and storage medium
CN113191148A (en) * 2021-04-30 2021-07-30 西安理工大学 Rail transit entity identification method based on semi-supervised learning and clustering
CN113704429A (en) * 2021-08-31 2021-11-26 平安普惠企业管理有限公司 Semi-supervised learning-based intention identification method, device, equipment and medium
CN113704479A (en) * 2021-10-26 2021-11-26 深圳市北科瑞声科技股份有限公司 Unsupervised text classification method and device, electronic equipment and storage medium
CN114003720A (en) * 2021-10-29 2022-02-01 平安国际智慧城市科技股份有限公司 Business document classification method, device, equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
洪宇;张宇;刘挺;郑伟;龚诚;李生;: "基于层次聚类的自适应信息过滤学习算法", 中文信息学报, no. 03, 15 May 2007 (2007-05-15) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116467602A (en) * 2023-04-27 2023-07-21 中国工商银行股份有限公司 Training data generation method, device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN114817455B (en) 2026-04-07

Similar Documents

Publication Publication Date Title
CN109800306B (en) Intent analysis method, device, display terminal and computer-readable storage medium
JP4311552B2 (en) Automatic document separation
CN106503236B (en) Artificial intelligence based problem classification method and device
CN111651996B (en) Abstract generation method, device, electronic device and storage medium
US7707027B2 (en) Identification and rejection of meaningless input during natural language classification
JP2022512065A (en) Image classification model training method, image processing method and equipment
CN111177186B (en) Single sentence intention recognition method, device and system based on question retrieval
JPWO2007138875A1 (en) Word dictionary / language model creation system, method, program, and speech recognition system for speech recognition
CN109359296B (en) Public opinion emotion recognition method, device and computer-readable storage medium
CN113012687B (en) Information interaction method and device and electronic equipment
CN112671985A (en) Agent quality inspection method, device, equipment and storage medium based on deep learning
CN114972222A (en) Cell Information Statistical Method, Apparatus, Device and Computer-readable Storage Medium
WO2022042297A1 (en) Text clustering method, apparatus, electronic device, and storage medium
CN108776677B (en) Parallel sentence library creating method and device and computer readable storage medium
CN107291774A (en) Error sample recognition methods and device
CN117609493A (en) Text classification method and device based on large language model
CN116644183B (en) Text classification method, device and storage medium
CN113095073B (en) Corpus tag generation method and device, computer equipment and storage medium
CN114817478A (en) Text-based question and answer method and device, computer equipment and storage medium
CN114491010B (en) Training method and device for information extraction model
CN114817455A (en) Model construction method, device, equipment and medium
CN110782879A (en) Sample size-based voiceprint clustering method, device, equipment and storage medium
CN117150395B (en) Model training and intention recognition method and device, electronic equipment and storage medium
CN113139368B (en) Text editing method and system
CN112988992B (en) Information interaction method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant