CN114238564B - Information retrieval method, device, electronic device and storage medium - Google Patents

Information retrieval method, device, electronic device and storage medium Download PDF

Info

Publication number
CN114238564B
CN114238564B CN202111495205.8A CN202111495205A CN114238564B CN 114238564 B CN114238564 B CN 114238564B CN 202111495205 A CN202111495205 A CN 202111495205A CN 114238564 B CN114238564 B CN 114238564B
Authority
CN
China
Prior art keywords
target
word
document
information retrieval
relevance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111495205.8A
Other languages
Chinese (zh)
Other versions
CN114238564A (en
Inventor
韩佳
杜新凯
吕超
谷姗姗
张晗
李文灏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sunshine Insurance Group Co Ltd
Original Assignee
Sunshine Insurance Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sunshine Insurance Group Co Ltd filed Critical Sunshine Insurance Group Co Ltd
Priority to CN202111495205.8A priority Critical patent/CN114238564B/en
Publication of CN114238564A publication Critical patent/CN114238564A/en
Application granted granted Critical
Publication of CN114238564B publication Critical patent/CN114238564B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides an information retrieval method, a device, electronic equipment and a storage medium, wherein the information retrieval method is based on a trained target information retrieval model, a first word vector of target words in target query sentences is determined, a central word vector of the target words in each target document is determined, and a first correlation degree between the query sentences and the target documents is determined according to the first word vector and the central word vector.

Description

Information retrieval method, device, electronic equipment and storage medium
Technical Field
The present application relates to the field of power grid optimization technologies, and in particular, to an information retrieval method, an information retrieval device, an electronic device, and a storage medium.
Background
The information retrieval mode in the prior art mainly comprises the steps of retrieving a search text in a database based on a matching algorithm of word frequency, and the information retrieval mode based on the word frequency is still the main stream method of the current retrieval system, but the information retrieval based on the word frequency only considers the occurrence times of words in the database, and does not combine a semantic environment and a word sense environment, so that the matching degree and accuracy of the retrieved documents are lower.
Disclosure of Invention
Accordingly, an object of the present application is to provide an information retrieval method, an apparatus, an electronic device, and a storage medium, based on a trained target information retrieval model, determine a first word vector of a target word in a target query sentence, and determine a central word vector of the target word in each target document, and determine a first relevance between the query sentence and the target document according to the first word vector and the central word vector.
The embodiment of the application provides an information retrieval method, which comprises the following steps:
Inputting a target query sentence into a query layer in a trained target information retrieval model to obtain at least one target vocabulary which is the same as each target document in a text library, wherein the target documents are documents which have the same vocabulary as the target query sentence in the text library;
Inputting each obtained target vocabulary into a word vector extraction network layer in a trained target information retrieval model to obtain a first word vector of each target vocabulary in the target query statement and a second word vector of each target vocabulary in a sliding window of each target document, wherein the sliding window is a window containing adjacent first preset number of characters in the target document, and the sliding window contains at least one character in at least one target vocabulary;
for any target word in each target document, determining a central word vector of the target word based on second word vectors of sliding windows associated with the target word;
inputting a first word vector and a central word vector corresponding to a target word of each target document into a relevance scoring layer in a trained target information retrieval model, and calculating to obtain a first relevance between the target query sentence and each target document;
and selecting a preset number of target documents to output as search documents according to the sequence of the correlation degree between the target query statement and the target documents from high to low.
Further, for any target word in each target document, determining a center word vector of the target word based on the second word vector of each sliding window associated with the target word, including:
aiming at any target vocabulary in each target document, obtaining a second word vector of the target vocabulary in each sliding window;
And carrying out summation calculation on each second word vector, carrying out average value calculation after the summation calculation, and determining the average value as a central word vector of the target word.
Further, a first relevance between the target query statement and each target document is calculated by:
performing point multiplication calculation on a first word vector corresponding to each target word and each central word vector corresponding to each target word, and determining a point multiplication result as a second relativity between each target word and each target document;
And carrying out summation calculation on the second relevance corresponding to each target word in each target document, and determining the summation result as a first relevance between the target query sentence and each target document.
Further, the performing a dot product calculation on the first word vector corresponding to each target word and each central word vector corresponding to each target word, and determining a dot product result as a second relativity between each target word and each target document, includes:
Performing point multiplication calculation on a first word vector corresponding to each target word and each central word vector corresponding to each target word, and selecting a maximum point multiplication value as a result correlation degree;
And determining the result relatedness as a second relatedness between each target vocabulary and each target document.
Further, a trained target information retrieval model is determined by;
Acquiring a sample query statement and a sample document corresponding to the sample query statement;
carrying out relevance division on the sample query statement and the sample document according to a preset relevance, determining that the sample query statement with the relevance greater than or equal to the preset relevance and the sample document are sample related texts, and determining that the sample query statement with the relevance smaller than the preset relevance and the sample document are sample uncorrelated texts;
Training the initial information retrieval model according to the sample related text and the sample uncorrelated text, and determining a trained target information retrieval model.
The embodiment of the application also provides an information retrieval device, which comprises:
The first determining module is used for inputting target query sentences into a query layer in a trained target information retrieval model to obtain at least one target vocabulary which is the same as each target document in a text library, wherein the target documents are documents which have the same vocabulary as the target query sentences in the text library;
the second determining module is used for inputting each obtained target vocabulary into a word vector extraction network layer in a trained target information retrieval model, obtaining a first word vector of each target vocabulary in the target query statement and a second word vector of each target vocabulary in a sliding window in each target document, wherein the sliding window is a window containing adjacent first preset number of characters in the target document, and the sliding window contains at least one character in at least one target vocabulary;
The third determining module is used for determining a central word vector of any target word in each target document based on the second word vector of each sliding window associated with the target word;
the computing module is used for inputting a first word vector and a central word vector corresponding to a target word of each target document into a relevance scoring layer in a trained target information retrieval model, and computing to obtain a first relevance between the target query sentence and each target document;
And the fourth determining module is used for selecting a preset number of target documents to be output as search documents according to the sequence of the correlation degree between the target query statement and the target documents from high to low.
Further, the third determining module determines, for any target word in each target document, a center word vector of the target word based on the second word vector of each sliding window associated with the target word, including:
aiming at any target vocabulary in each target document, obtaining a second word vector of the target vocabulary in each sliding window;
And carrying out summation calculation on each second word vector, carrying out average value calculation after the summation calculation, and determining the average value as a central word vector of the target word.
Further, the calculating module calculates a first relevance between the target query statement and each target document by:
performing point multiplication calculation on a first word vector corresponding to each target word and each central word vector corresponding to each target word, and determining a point multiplication result as a second relativity between each target word and each target document;
And carrying out summation calculation on the second relevance corresponding to each target word in each target document, and determining the summation result as a first relevance between the target query sentence and each target document.
The embodiment of the application also provides electronic equipment, which comprises a processor, a memory and a bus, wherein the memory stores machine-readable instructions executable by the processor, when the electronic equipment is operated, the processor and the memory are communicated through the bus, and the machine-readable instructions are executed by the processor to execute the steps of the information retrieval method.
The embodiments of the present application also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the information retrieval method as described above.
Compared with the prior art, the information retrieval method, the device, the electronic equipment and the storage medium provided by the embodiment of the application have the advantages that based on the trained target information retrieval model, the first word vector of the target vocabulary in the target query sentence is determined, the central word vector of the target vocabulary in each target document is determined, and the first correlation degree between the query sentence and the target document is determined according to the first word vector and the central word vector.
In order to make the above objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of an information retrieval method according to an embodiment of the present application;
FIG. 2 is a diagram showing the relationship between a sliding window and a target document in an information retrieval method according to an embodiment of the present application;
FIG. 3 is a flow chart of another information retrieval method provided by an embodiment of the present application;
fig. 4 is a schematic structural diagram of an information retrieval device according to an embodiment of the present application;
Fig. 5 shows a schematic structural diagram of an electronic device according to an embodiment of the present application.
In the figure:
400-information retrieval means, 410-first determination module, 420-second determination module, 430-third determination module, 440-calculation module, 450-fourth determination module, 500-electronic device, 510-processor, 520-memory, 530-bus.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. Based on the embodiments of the present application, every other embodiment obtained by a person skilled in the art without making any inventive effort falls within the scope of protection of the present application.
Firstly, the application scene applicable to the application is introduced, and research shows that the information retrieval mode in the prior art mainly retrieves the search text in the database based on a matching algorithm of vocabulary frequency, and the information retrieval mode based on the vocabulary frequency is still the mainstream method of the current retrieval system, but the information retrieval based on the vocabulary frequency only considers the occurrence times of the vocabulary in the database and does not combine with a semantic environment and a word sense environment, so that the matching degree and accuracy of the retrieved document are lower.
Based on the above, the embodiment of the application provides an information retrieval method, an information retrieval device, an electronic device and a storage medium, which are based on a trained target information retrieval model, determine a first word vector of a target word in a target query sentence, and determine a central word vector of the target word in each target document, and determine a first relativity between the query sentence and the target document according to the first word vector and the central word vector.
Referring to fig. 1, fig. 1 is a flowchart of an information retrieval method according to an embodiment of the application. As shown in fig. 1, the information retrieval method provided by the embodiment of the present application includes:
S101, inputting target query sentences into a query layer in a trained target information retrieval model to obtain at least one target vocabulary which is the same as each target document in a text library, wherein the target documents are documents which have the same vocabulary as the target query sentences in the text library.
In the step, after a user generates a series of search documents corresponding to a target query sentence to be queried, the target query sentence is input into a query network layer in a trained target information search model to obtain a target vocabulary between the target query sentence and each target document in a text library, wherein the documents in the text library are screened, the documents which are not related to the target query sentence are removed, and the documents with the same vocabulary in the text library as the target query sentence are screened out to determine the documents as target documents.
The target vocabulary is defined as the same words which exist in the target query sentence and the target document at the same time, the number of characters of the target vocabulary is not unique, the target vocabulary can be divided in a self-defined mode according to the expression characteristics of Chinese, q is used for representing the target vocabulary in the embodiment provided by the application, qi is represented as the i-th marked target vocabulary, and the query network layer is used for representing the network structure layer for determining the target vocabulary in the target information retrieval model.
The target information retrieval model is obtained by training an initial information retrieval model, and the trained target information retrieval model is determined in the following manner;
and acquiring a sample query statement and a sample document corresponding to the sample query statement.
Firstly, acquiring an initial sample query statement and an initial sample document corresponding to the initial sample query statement, wherein the initial sample query statement is acquired from a log or other external sample databases, and then, manually labeling the initial sample document corresponding to the initial sample query statement, denoising the initial query statement and the initial sample document, deleting meaningless special characters such as blank spaces, messy codes and the like, and cleaning by using a regular expression.
The regular expression is a logic formula for operating the character string, namely, a "regular character string" is formed by a plurality of specific characters defined in advance and a combination of the specific characters, and the "regular character string" is used for expressing a filtering logic for the character string.
And carrying out relevance division on the sample query statement and the sample document according to a preset relevance, determining that the sample query statement with the relevance being greater than or equal to the preset relevance and the sample document are sample related texts, and determining that the sample query statement with the relevance being less than the preset relevance and the sample document are sample uncorrelated texts.
In the above, the sample related text and the sample uncorrelated text both include a sample training set and a sample verification set for training an initial information inspection model, a python programming language is used to divide a sample query sentence and a sample document according to a preset relevance, and the sample query sentence with the relevance being greater than or equal to the preset relevance and the sample document are used as sample related texts, wherein the sample related text is compared with the preset relevance by performing a relevance mark of the sample query sentence and the sample document, the sample query sentence with the relevance being less than the preset relevance and the sample document are used as sample uncorrelated texts, and the sample query sentence and the sample document with the high predicted relevance obtained according to a probability retrieval algorithm are determined as difficult uncorrelated samples.
Thus, the probabilistic search algorithm is an algorithm that is proposed based on a probabilistic search model, including but not limited to using BM25 information search algorithms.
And training the initial information retrieval model according to the sample training set and the sample verification set, and determining a trained target information retrieval model.
The initial information retrieval model is trained according to the sample training set, and in the training process, network structure parameters in the initial information retrieval model are updated and replaced in real time by using the sample verification set, so that verification of the training effect of the initial information retrieval model is achieved, and after training is finished, model documents in the trained target information retrieval model are correspondingly stored, so that follow-up information retrieval tasks or target document reordering tasks are facilitated.
The method comprises the steps of firstly carrying out offline processing on documents in a text library by using a document encoder to realize preprocessing of the documents based on target query sentences, deleting some noise in the documents, then loading the trained target information retrieval model in the retrieval system, initializing the trained target information retrieval model, and then inputting the denoised target query sentences to obtain a preset number of target documents corresponding to the target query sentences as retrieval documents to be output.
And secondly, using a document encoder to perform offline processing on the documents in the text library, recalling target documents of other text libraries except a designated number of text libraries by using a BM25 information retrieval algorithm in a search engine, retrieving the documents based on the initialized target information, performing relevance matching on the recalled target documents except the text library, and outputting the preset number of target documents as recalled retrieval documents.
The initial information retrieval model in the embodiment provided by the application can be a language model based on BERT coding, but is not limited to the use of BERT coding, and the BERT coder is subjected to fine tuning by using an Adam optimizer, so that the function of adjusting parameters of the initial information retrieval model in the training process is realized.
Wherein the initial information retrieval model may be trained using a negative log likelihood function, which is a function of parameters in the model, a "likelihood" and a "probability" meaning similar, but statistically having quite different meanings, the probability being used to predict the next observations given the parameters, and the likelihood being used to estimate the possible values of the parameters of a given model based on some observations.
Here, a Token is used to represent a sample query sentence and a string of character strings generated by a sample document, and is used as a Token for a client to request, after logging in for the first time, the server generates a Token and returns the Token to the client, and then the client only needs to carry the Token to request data. The Token, in fact, may call a surprise, i.e., prior to some data transmission, a check of the password is performed, different passwords are authorized for different data operations, in the embodiment provided by the present application, token is expressed as any character or word in the sample query sentence and the sample document, that is, each sample query sentence and each sample document contains a plurality of Token, and the number of Token depends on the number of characters or words.
Here, the character string Token is specifically expressed as:
Here the number of the elements is the number, Is a matrix that maps nlm-dimensional output of an initial information retrieval model to vectors of low dimension nt, a trained target information retrieval model is an information retrieval model to which [ CLS ] matching in a pre-training language model (Bidirectional Encoder Representations from Transformers, BERT) is added, wherein the pre-training language model learns deep bi-directional representation in non-labeling data through pre-training, and adding [ CLS ] in the pre-training language model takes an output character vector corresponding to the symbol as semantic representation of the whole text or sentence.
Further, a standard sample query text and a standard sample document corresponding to the standard sample query text are acquired by the following steps:
and acquiring an initial sample query text and an initial standard sample document corresponding to the initial sample query text.
And denoising the initial sample query text and the initial standard sample document to determine the standard sample query text and the standard sample document.
S102, inputting each obtained target vocabulary into a word vector extraction network layer in a trained target information retrieval model, obtaining a first word vector of each target vocabulary in the target query statement and a second word vector of each target vocabulary in a sliding window of each target document, wherein the sliding window is a window containing adjacent first preset number characters in the target document, the sliding window contains at least one character in at least one target vocabulary, and an overlapping part containing a second preset number of characters between the two adjacent sliding windows.
In the step, each target word between a target query sentence and each target document in a text library is input into a word vector extraction network layer in a trained target information retrieval model, a first word vector of the target word in the target query sentence and a second word vector of the target word in a sliding window in each target document are obtained, wherein the coding of the first word vector based on Token is specifically expressed as follows:
Here, the english name of the target query sentence is query, in which the abbreviation q is used for expression, The character string representing the i-th labeled target query term, b tok is a coefficient, and W tok is a coefficient in matrix form.
And in consideration of the relevance of different target words in the target query statement, we acquire the first word vector by using [ CLS ] matching in BERT, and the specific expression of the first word vector is:
here, the Token-based encoding of the second word vector is specifically expressed as:
here, the english name of the target document is document, in which the abbreviation d is used for, The i-th marked target document string, b tok, is a coefficient.
And in consideration of the relevance of different target words in the target query statement, we acquire a second word vector by using [ CLS ] matching in BERT, and the specific expression of the second word vector is:
In the above, the first step of, AndThe relativity between the words can provide high-level semantic matching information, and the problem of unmatched words is relieved.
In this way, the sliding window is a window sliding in each target document according to a preset direction, the sliding window includes a first preset number of characters, the first preset number of characters is not fixed, the user-defined number of characters can be set according to the expression characteristics of Chinese characters, each sliding window includes at least one character in at least one target vocabulary, and an overlapping part of a second preset number of characters is included between two adjacent sliding windows.
As shown in fig. 2, fig. 2 is a structural diagram of a relationship between a sliding window and a target document in an information retrieval method according to an embodiment of the present application.
The sliding window is set to include four characters, namely, the first preset number of characters is four, for example, the target query sentence provided by the embodiment is "proud and lagging behind", one target document in the corresponding text library is "body is Chinese", women's volleyball obtains the olympic champion, and I feel proud and self-pornography.
Here, only one target query sentence is provided with the target vocabulary in the target document, the target vocabulary is specifically "proud", at this time, all sliding windows including any character "proud" or "proud" in the target vocabulary are obtained, a second word vector based on semantics of the target vocabulary in each sliding window is obtained by using an encoder, and the following table shows that the sliding windows including the target vocabulary "proud" in fig. 2 are "proud to proud", "proud to proud and" proud to proud and self ".
If a plurality of target words exist in the target document, the process of setting the sliding window for each target word is the same as the above process, and will not be described in detail here.
S103, determining a central word vector of any target word in each target document based on the second word vector of each sliding window associated with the target word.
In the step, the second word vector of each sliding window associated with any target word is subjected to summation average calculation, and the central word vector of the target word in the target document is determined.
Thus, the center word vector of the target word in the target document is denoted by CK.
S104, inputting the first word vector and the central word vector corresponding to the target word of each target document into a relevance scoring layer in a trained target information retrieval model, and calculating to obtain the first relevance between the target query sentence and each target document.
In the step, after the first word vector and the central word vector corresponding to the target word of each target document are input into a correlation calculation network layer in a trained target information retrieval model, carrying out dot product calculation on the first word vector corresponding to each target word and each central word vector corresponding to each target word, selecting the maximum dot product value as a result correlation, and determining a dot product result as a second correlation between each target word and each target document.
Here, a summation calculation is performed for each of the second relevance degrees corresponding to each of the target words in each of the target documents, and a summation result is determined as a first relevance degree between a target query sentence and each of the target documents.
The expression for calculating the first relevance between the target query statement and each target document is specifically as follows:
In the formula, q i epsilon q d represents that the i-th marked target vocabulary is the same word shared by the target query sentence and the target document, wherein the maximum point multiplication value is selected, namely, max operation is performed to capture the important semantic information of the target vocabulary in the target document.
The above-mentioned method is a method for calculating the first relevance between the target query sentence and each target document in [ CLS ] matching mode not introduced into BERT, and the following formula is an expression added with the first relevance calculated in [ CLS ] matching mode:
Where full represents the output of the correlation computation network layer in the trained target information retrieval model with [ CLS ] matching added.
S105, selecting a preset number of target documents to be output as search documents according to the sequence of the correlation degree between the target query statement and the target documents from high to low.
In the step, when a user queries a target query sentence, a preset number of target documents which are ranked from high to low according to the relevance are output as search documents by inputting the target query sentence into a trained target information search model, and the search documents can be used as answer documents of the target query sentence.
Compared with the information retrieval method in the prior art, the information retrieval method provided by the embodiment of the application has the advantages that the first word vector of the target vocabulary in the target query sentence is determined based on the trained target information retrieval model, the central word vector of the target vocabulary in each target document is determined, and the first correlation degree between the query sentence and the target document is determined according to the first word vector and the central word vector.
Referring to fig. 3, fig. 3 is a flowchart of an information retrieval method according to another embodiment of the application. As shown in fig. 3, the information retrieval method provided by the embodiment of the present application includes:
S201, inputting target query sentences into a query layer in a trained target information retrieval model to obtain at least one target vocabulary which is the same as each target document in a text library, wherein the target documents are documents which have the same vocabulary as the target query sentences in the text library.
S202, inputting each obtained target vocabulary into a word vector extraction network layer in a trained target information retrieval model, obtaining a first word vector of each target vocabulary in the target query statement and a second word vector of each target vocabulary in a sliding window of each target document, wherein the sliding window is a window containing adjacent first preset number characters in the target document, the sliding window contains at least one character in at least one target vocabulary, and an overlapping part containing a second preset number of characters between two adjacent sliding windows.
S203, aiming at any target vocabulary in each target document, acquiring a second word vector of the target vocabulary in each sliding window.
In the step, a second word vector of any target word in each sliding window is obtained for a plurality of target words in each target document.
The method comprises the steps that a window which slides in a preset direction is formed in each target document, the sliding window comprises a first preset number of characters, the first preset number of characters are not fixed, the preset number of characters can be set according to the expression characteristics of Chinese characters, each sliding window comprises at least one character in at least one target vocabulary, an overlapping part of a second preset number of characters is arranged between two adjacent sliding windows, and the overlapping part of the second preset number of characters is set to be one character.
S204, carrying out summation calculation on each second word vector, carrying out average value calculation after the summation calculation, and determining the average value as a central word vector of the target word.
In the step, the central word vector of each target word is obtained by summing the second word vectors of the sliding window and then obtaining the average value.
S205, inputting a first word vector and a central word vector corresponding to a target word of each target document into a relevance scoring layer in a trained target information retrieval model, and calculating to obtain a first relevance between the target query sentence and each target document.
S206, selecting a preset number of target documents to be output as search documents according to the sequence of the correlation degree between the target query statement and the target documents from high to low.
The descriptions of S201 to S202 and S205 to S206 may refer to the descriptions of S101 to S102 and S104 to S105, and the same technical effects can be achieved, which will not be described in detail.
Compared with the information retrieval method in the prior art, the information retrieval method provided by the embodiment of the application has the advantages that the first word vector of the target vocabulary in the target query sentence is determined based on the trained target information retrieval model, the central word vector of the target vocabulary in each target document is determined, and the first correlation degree between the query sentence and the target document is determined according to the first word vector and the central word vector.
Referring to fig. 4, fig. 4 is a schematic structural diagram of an information retrieval device according to an embodiment of the application. As shown in fig. 4, the information retrieval apparatus 400 includes:
The first determining module 410 is configured to input a target query sentence into a query layer in a trained target information retrieval model, so as to obtain at least one target vocabulary which is the same between the target query sentence and each target document in a text library, where the target document is a document in the text library in which the same vocabulary exists as the target query sentence.
The second determining module 420 is configured to input each obtained target vocabulary into a word vector extraction network layer in a trained target information retrieval model, obtain a first word vector of each target vocabulary in the target query sentence, and a second word vector of each target vocabulary in a sliding window in each target document, where the sliding window is a window containing adjacent first preset number of characters in the target document, and the sliding window contains at least one character in at least one target vocabulary, and an overlapping portion containing a second preset number of characters between two adjacent sliding windows.
The third determining module 430 is configured to determine, for any target vocabulary in each target document, a center word vector of the target vocabulary based on the second word vectors of the sliding windows associated with the target vocabulary.
Further, the third determining module determines, for any target word in each target document, a center word vector of the target word based on the second word vector of each sliding window associated with the target word, including:
and aiming at any target word in each target document, acquiring a second word vector of the target word in each sliding window.
And carrying out summation calculation on each second word vector, carrying out average value calculation after the summation calculation, and determining the average value as a central word vector of the target word.
The calculating module 440 is configured to input a first word vector and a central word vector corresponding to a target word of each target document into a relevance scoring layer in the trained target information retrieval model, and calculate a first relevance between the target query sentence and each target document.
Further, the calculating module 440 calculates a first relevance between the target query term and each of the target documents by:
and carrying out point multiplication calculation on the first word vector corresponding to each target word and each central word vector corresponding to each target word, and determining a point multiplication result as a second relativity between each target word and each target document.
And carrying out summation calculation on the second relevance corresponding to each target word in each target document, and determining the summation result as a first relevance between the target query sentence and each target document.
And a fourth determining module 440, configured to select a preset number of target documents as search documents and output the search documents according to the order of the relevance between the target query sentence and the target documents from high to low.
Compared with the information retrieval method in the prior art, the information retrieval device 400 provided by the embodiment of the application determines the first word vector of the target word in the target query sentence and the central word vector of the target word in each target document based on the trained target information retrieval model, and determines the first relativity of the query sentence and the target document according to the first word vector and the central word vector.
Referring to fig. 5, fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the application. As shown in fig. 5, the electronic device 500 includes a processor 510, a memory 520, and a bus 530.
The memory 520 stores machine-readable instructions executable by the processor 510, and when the electronic device 500 is running, the processor 510 communicates with the memory 520 through the bus 530, and when the machine-readable instructions are executed by the processor 510, the steps of the information retrieval method in the method embodiments shown in fig. 1 and fig. 4 can be executed, and the specific implementation manner can be referred to the method embodiments and will not be described herein.
The embodiment of the present application further provides a computer readable storage medium, where a computer program is stored, where the computer program when executed by a processor may perform the steps of the information retrieval method in the method embodiments shown in fig. 1 and fig. 4, and the specific implementation manner may refer to the method embodiment and will not be described herein.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.
In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. The storage medium includes a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.
It should be noted that the foregoing embodiments are merely illustrative embodiments of the present application, and not restrictive, and the scope of the application is not limited to the embodiments, and although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those skilled in the art that any modification, variation or substitution of some of the technical features of the embodiments described in the foregoing embodiments may be easily contemplated within the scope of the present application, and the spirit and scope of the technical solutions of the embodiments do not depart from the spirit and scope of the embodiments of the present application. Therefore, the protection scope of the application is subject to the protection scope of the claims.

Claims (9)

1.一种信息检索方法,其特征在于,所述信息检索方法包括:1. An information retrieval method, characterized in that the information retrieval method comprises: 将目标查询语句输入训练好的目标信息检索模型中的查询层中,得到所述目标查询语句与文本库中每个目标文档之间相同的至少一个目标词汇;其中,所述目标文档是所述文本库中与所述目标查询语句存在相同词汇的文档;Inputting a target query statement into a query layer in a trained target information retrieval model to obtain at least one target vocabulary that is common between the target query statement and each target document in the text library; wherein the target document is a document in the text library that has the same vocabulary as the target query statement; 将得到的每个所述目标词汇输入训练好的目标信息检索模型中的词向量提取网络层中,得到每个所述目标词汇在所述目标查询语句中的第一词向量,以及每个所述目标词汇在每个所述目标文档中的滑动窗口中的第二词向量;所述滑动窗口为所述目标文档中包含相邻的第一预设个数字符的窗口,且所述滑动窗口中包含至少一个所述目标词汇中的至少一个字符;相邻的两个所述滑动窗口之间包含第二预设个数的字符的重叠部分;Input each of the target words obtained into the word vector extraction network layer in the trained target information retrieval model to obtain a first word vector of each of the target words in the target query sentence, and a second word vector of each of the target words in a sliding window in each of the target documents; the sliding window is a window containing adjacent first preset number of characters in the target document, and the sliding window contains at least one character in at least one of the target words; and the overlapping part between two adjacent sliding windows contains a second preset number of characters; 针对每个目标文档中任一目标词汇,基于与该目标词汇关联的各个滑动窗口的第二词向量,确定该目标词汇的中心词向量;For any target word in each target document, determine the central word vector of the target word based on the second word vectors of each sliding window associated with the target word; 将每个目标文档的目标词汇对应的第一词向量和中心词向量输入训练好的目标信息检索模型中的相关度打分层中,计算得到所述目标查询语句与每个所述目标文档之间的第一相关度;Inputting the first word vector and the center word vector corresponding to the target vocabulary of each target document into the relevance scoring layer in the trained target information retrieval model, and calculating the first relevance between the target query statement and each of the target documents; 按照所述目标查询语句与多个目标文档之间的相关度由高到低的顺序,选取预设数量的所述目标文档作为检索文档输出;Selecting a preset number of target documents as retrieval document outputs according to the order of relevance between the target query statement and the multiple target documents from high to low; 其中,通过以下方式确定训练好的目标信息检索模型:Among them, the trained target information retrieval model is determined by the following method: 获取样本查询语句和与所述样本查询语句对应的样本文档;Obtaining a sample query statement and a sample document corresponding to the sample query statement; 将所述样本查询语句与所述样本文档按照预设相关度进行相关度划分,确定所述相关度大于等于所述预设相关度的所述样本查询语句与所述样本文档为样本相关文本,以及确定所述相关度小于所述预设相关度的所述样本查询语句与所述样本文档为样本不相关文本;The sample query statement and the sample document are divided according to a preset relevance, and the sample query statement and the sample document whose relevance is greater than or equal to the preset relevance are determined as sample-related texts, and the sample query statement and the sample document whose relevance is less than the preset relevance are determined as sample-irrelevant texts; 根据所述样本相关文本以及所述样本不相关文本,对初始信息检索模型进行训练,确定训练好的目标信息检索模型。The initial information retrieval model is trained according to the sample-related text and the sample-irrelevant text, and a trained target information retrieval model is determined. 2.根据权利要求1所述的信息检索方法,其特征在于,所述针对每个目标文档中任一目标词汇,基于与该目标词汇关联的各个滑动窗口的第二词向量,确定该目标词汇的中心词向量,包括:2. The information retrieval method according to claim 1, characterized in that for any target word in each target document, determining the central word vector of the target word based on the second word vectors of each sliding window associated with the target word comprises: 针对每个目标文档中任一目标词汇,获取该目标词汇在各个滑动窗口的第二词向量;For any target word in each target document, obtain the second word vector of the target word in each sliding window; 针对各个所述第二词向量进行求和计算,并在求和计算后进行平均值计算,将所述平均值确定为该目标词汇的中心词向量。A summation is performed on each of the second word vectors, and an average value is calculated after the summation, and the average value is determined as the central word vector of the target vocabulary. 3.根据权利要求1所述的信息检索方法,其特征在于,通过以下方式计算得到所述目标查询语句与每个所述目标文档之间的第一相关度:3. The information retrieval method according to claim 1, characterized in that the first relevance between the target query statement and each of the target documents is calculated in the following manner: 针对每个目标词汇对应的第一词向量与每个所述目标词汇对应的每个中心词向量进行点乘计算,并将点乘结果确定为每个所述目标词汇与每个所述目标文档之间的第二相关度;Performing a dot product calculation on the first word vector corresponding to each target word and each center word vector corresponding to each target word, and determining the dot product result as a second relevance between each target word and each target document; 针对每个目标文档中的各个所述目标词汇对应的各个所述第二相关度进行求和计算,将求和结果确定为目标查询语句与每个所述目标文档之间的第一相关度。A sum calculation is performed on each of the second relevances corresponding to each of the target words in each target document, and the sum result is determined as the first relevance between the target query statement and each of the target documents. 4.根据权利要求3所述的信息检索方法,其特征在于,所述针对每个目标词汇对应的第一词向量与每个所述目标词汇对应的每个中心词向量进行点乘计算,并将点乘结果确定为每个所述目标词汇与每个所述目标文档之间的第二相关度,包括:4. The information retrieval method according to claim 3, characterized in that the step of performing a dot product calculation on the first word vector corresponding to each target word and each center word vector corresponding to each target word, and determining the dot product result as the second relevance between each target word and each target document, comprises: 针对每个目标词汇对应的第一词向量与每个所述目标词汇对应的每个中心词向量进行点乘计算,选取最大点乘值为结果相关度;Perform a dot product calculation on the first word vector corresponding to each target word and each center word vector corresponding to each target word, and select the maximum dot product value as the result relevance; 将所述结果相关度确定为每个所述目标词汇与每个所述目标文档之间的第二相关度。The result relevance is determined as a second relevance between each of the target words and each of the target documents. 5.一种信息检索装置,其特征在于,所述信息检索装置包括:5. An information retrieval device, characterized in that the information retrieval device comprises: 第一确定模块,用于将目标查询语句输入训练好的目标信息检索模型中的查询层中,得到所述目标查询语句与文本库中每个目标文档之间相同的至少一个目标词汇;其中,所述目标文档是所述文本库中与所述目标查询语句存在相同词汇的文档;其中,通过以下方式确定训练好的目标信息检索模型:The first determination module is used to input the target query statement into the query layer of the trained target information retrieval model to obtain at least one target vocabulary that is the same between the target query statement and each target document in the text library; wherein the target document is a document in the text library that has the same vocabulary as the target query statement; wherein the trained target information retrieval model is determined by: 获取样本查询语句和与所述样本查询语句对应的样本文档;Obtaining a sample query statement and a sample document corresponding to the sample query statement; 将所述样本查询语句与所述样本文档按照预设相关度进行相关度划分,确定所述相关度大于等于所述预设相关度的所述样本查询语句与所述样本文档为样本相关文本,以及确定所述相关度小于所述预设相关度的所述样本查询语句与所述样本文档为样本不相关文本;The sample query statement and the sample document are divided according to a preset relevance, and the sample query statement and the sample document whose relevance is greater than or equal to the preset relevance are determined as sample-related texts, and the sample query statement and the sample document whose relevance is less than the preset relevance are determined as sample-irrelevant texts; 根据所述样本相关文本以及所述样本不相关文本,对初始信息检索模型进行训练,确定训练好的目标信息检索模型;Training an initial information retrieval model according to the sample-related text and the sample-irrelevant text, and determining a trained target information retrieval model; 第二确定模块,用于将得到的每个所述目标词汇输入训练好的目标信息检索模型中的词向量提取网络层中,得到每个所述目标词汇在所述目标查询语句中的第一词向量,以及每个所述目标词汇在每个所述目标文档中的滑动窗口中的第二词向量;所述滑动窗口为所述目标文档中包含相邻的第一预设个数字符的窗口,且所述滑动窗口中包含至少一个所述目标词汇中的至少一个字符;相邻的两个所述滑动窗口之间包含第二预设个数的字符的重叠部分;The second determination module is used to input each of the target words obtained into the word vector extraction network layer in the trained target information retrieval model to obtain the first word vector of each of the target words in the target query sentence, and the second word vector of each of the target words in the sliding window in each of the target documents; the sliding window is a window containing adjacent first preset number of characters in the target document, and the sliding window contains at least one character in at least one of the target words; the overlapping part between two adjacent sliding windows contains a second preset number of characters; 第三确定模块,用于针对每个目标文档中任一目标词汇,基于与该目标词汇关联的各个滑动窗口的第二词向量,确定该目标词汇的中心词向量;A third determination module is used to determine, for any target word in each target document, a central word vector of the target word based on the second word vectors of each sliding window associated with the target word; 计算模块,用于将每个目标文档的目标词汇对应的第一词向量和中心词向量输入训练好的目标信息检索模型中的相关度打分层中,计算得到所述目标查询语句与每个所述目标文档之间的第一相关度;A calculation module, used for inputting the first word vector and the center word vector corresponding to the target vocabulary of each target document into the relevance scoring layer in the trained target information retrieval model, and calculating the first relevance between the target query statement and each of the target documents; 第四确定模块,用于按照所述目标查询语句与多个目标文档之间的相关度由高到低的顺序,选取预设数量的所述目标文档作为检索文档输出。The fourth determination module is used to select a preset number of the target documents as retrieval document outputs according to the order of relevance between the target query statement and the multiple target documents from high to low. 6.根据权利要求5所述的信息检索装置,其特征在于,所述第三确定模块中针对每个目标文档中任一目标词汇,基于与该目标词汇关联的各个滑动窗口的第二词向量,确定该目标词汇的中心词向量,包括:6. The information retrieval device according to claim 5, characterized in that the third determination module determines the central word vector of any target word in each target document based on the second word vectors of each sliding window associated with the target word, comprising: 针对每个目标文档中任一目标词汇,获取该目标词汇在各个滑动窗口的第二词向量;For any target word in each target document, obtain the second word vector of the target word in each sliding window; 针对各个所述第二词向量进行求和计算,并在求和计算后进行平均值计算,将所述平均值确定为该目标词汇的中心词向量。A summation is performed on each of the second word vectors, and an average value is calculated after the summation, and the average value is determined as the central word vector of the target vocabulary. 7.根据权利要求5所述的信息检索装置,其特征在于,所述计算模块中通过以下方式计算得到所述目标查询语句与每个所述目标文档之间的第一相关度:7. The information retrieval device according to claim 5, characterized in that the calculation module calculates the first relevance between the target query statement and each of the target documents in the following manner: 针对每个目标词汇对应的第一词向量与每个所述目标词汇对应的每个中心词向量进行点乘计算,并将点乘结果确定为每个所述目标词汇与每个所述目标文档之间的第二相关度;Performing a dot product calculation on the first word vector corresponding to each target word and each center word vector corresponding to each target word, and determining the dot product result as a second relevance between each target word and each target document; 针对每个目标文档中的各个所述目标词汇对应的各个所述第二相关度进行求和计算,将求和结果确定为目标查询语句与每个所述目标文档之间的第一相关度。A sum calculation is performed on each of the second relevances corresponding to each of the target words in each target document, and the sum result is determined as the first relevance between the target query statement and each of the target documents. 8.一种电子设备,其特征在于,包括:处理器、存储器和总线,所述存储器存储有所述处理器可执行的机器可读指令,当电子设备运行时,所述处理器与所述存储器之间通过总线通信,所述机器可读指令被所述处理器运行时执行如上述权利要求1至4中任一所述的信息检索方法的步骤。8. An electronic device, characterized in that it comprises: a processor, a memory and a bus, wherein the memory stores machine-readable instructions executable by the processor, and when the electronic device is running, the processor and the memory communicate through the bus, and the machine-readable instructions are executed by the processor to perform the steps of the information retrieval method as described in any one of claims 1 to 4 above. 9.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器运行时执行如上述权利要求1至4中任一所述的信息检索方法的步骤。9. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the information retrieval method as described in any one of claims 1 to 4 are executed.
CN202111495205.8A 2021-12-09 2021-12-09 Information retrieval method, device, electronic device and storage medium Active CN114238564B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111495205.8A CN114238564B (en) 2021-12-09 2021-12-09 Information retrieval method, device, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111495205.8A CN114238564B (en) 2021-12-09 2021-12-09 Information retrieval method, device, electronic device and storage medium

Publications (2)

Publication Number Publication Date
CN114238564A CN114238564A (en) 2022-03-25
CN114238564B true CN114238564B (en) 2025-05-13

Family

ID=80754175

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111495205.8A Active CN114238564B (en) 2021-12-09 2021-12-09 Information retrieval method, device, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN114238564B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115168537B (en) * 2022-06-30 2023-06-27 北京百度网讯科技有限公司 Training method and device for semantic retrieval model, electronic equipment and storage medium
CN118568204B (en) * 2024-07-29 2024-11-01 北京城市网邻信息技术有限公司 Text information processing method, device and electronic device based on artificial intelligence

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108520033A (en) * 2018-03-28 2018-09-11 华中师范大学 Enhancing pseudo-linear filter model information search method based on superspace simulation language
CN109522392A (en) * 2018-10-11 2019-03-26 平安科技(深圳)有限公司 Voice-based search method, server and computer readable storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107491547B (en) * 2017-08-28 2020-11-10 北京百度网讯科技有限公司 Search method and device based on artificial intelligence
CN113536800A (en) * 2020-04-13 2021-10-22 北京金山数字娱乐科技有限公司 A word vector representation method and device
CN112507091A (en) * 2020-12-01 2021-03-16 百度健康(北京)科技有限公司 Method, device, equipment and storage medium for retrieving information

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108520033A (en) * 2018-03-28 2018-09-11 华中师范大学 Enhancing pseudo-linear filter model information search method based on superspace simulation language
CN109522392A (en) * 2018-10-11 2019-03-26 平安科技(深圳)有限公司 Voice-based search method, server and computer readable storage medium

Also Published As

Publication number Publication date
CN114238564A (en) 2022-03-25

Similar Documents

Publication Publication Date Title
CN111444320B (en) Text retrieval method and device, computer equipment and storage medium
CN110347835B (en) Text clustering method, electronic device and storage medium
US10606946B2 (en) Learning word embedding using morphological knowledge
CN107836000B (en) Improved artificial neural network method and electronic device for language modeling and prediction
CN110162630B (en) A method, device and equipment for deduplication of text
US11573994B2 (en) Encoding entity representations for cross-document coreference
CN110309192B (en) Structural data matching using neural network encoders
CN111985228B (en) Text keyword extraction method, text keyword extraction device, computer equipment and storage medium
CN112256822A (en) Text search method, apparatus, computer equipment and storage medium
CN111651986B (en) Event keyword extraction method, device, equipment and medium
CN109902159A (en) A kind of intelligent O&M statement similarity matching process based on natural language processing
US20250209277A1 (en) Systems and Methods for Machine-Learned Prediction of Semantic Similarity Between Documents
CN106815252A (en) A kind of searching method and equipment
CN113076739A (en) Method and system for realizing cross-domain Chinese text error correction
CN110162771B (en) Event trigger word recognition method and device and electronic equipment
CN112395875A (en) Keyword extraction method, device, terminal and storage medium
CN111680494A (en) Similar text generation method and device
CN108475264B (en) Machine translation method and device
CN110457707B (en) Content word keyword extraction method, device, electronic equipment and readable storage medium
CN113553510A (en) Text information recommendation method and device and readable medium
CN114861654A (en) A Defense Method for Adversarial Training Based on Part-of-Speech Fusion in Chinese Text
CN113553410A (en) Long document processing method, processing device, electronic equipment and storage medium
CN114328894A (en) Document processing method, document processing device, electronic equipment and medium
CN114238564B (en) Information retrieval method, device, electronic device and storage medium
CN114548123B (en) Machine translation model training methods and apparatus, and text translation methods and apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant