WO2019000920A1 - 一种推理规则自动发现方法及系统、数据库及检索方法 - Google Patents
一种推理规则自动发现方法及系统、数据库及检索方法 Download PDFInfo
- Publication number
- WO2019000920A1 WO2019000920A1 PCT/CN2018/073004 CN2018073004W WO2019000920A1 WO 2019000920 A1 WO2019000920 A1 WO 2019000920A1 CN 2018073004 W CN2018073004 W CN 2018073004W WO 2019000920 A1 WO2019000920 A1 WO 2019000920A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- medical
- relationship
- knowledge
- matrix
- relationships
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H70/00—ICT specially adapted for the handling or processing of medical references
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
- G06N5/041—Abduction
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/288—Entity relationship models
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2216/00—Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
- G06F2216/03—Data mining
Definitions
- the present disclosure relates to the field of medical knowledge base data mining technology, and in particular, to an inference rule automatic discovery method and system, a medical knowledge database and a retrieval method.
- the present disclosure provides a method and system for automatically discovering inference rules.
- a medical knowledge database and a retrieval method based on the above inference rules are also provided, so that the searcher can retrieve more complete medical knowledge data.
- the present disclosure provides an inference rule automatic discovery method, which is applied to a medical knowledge base, the medical knowledge base including a plurality of medical knowledge, each of the medical knowledge being two medical entities and one A combination of medical relationships, the method comprising:
- relationship matrix for each of the medical relationships, the relationship matrix reflecting a relationship between vectors of medical entities having the medical relationship;
- each of the acquired inference rules includes a first medical relationship, a second medical relationship, and a third medical relationship, wherein the inference rule is used to indicate that The first medical relationship and the second medical relationship are capable of inferring the third medical relationship.
- the step of obtaining a relationship matrix of each of the medical relationships includes:
- the initial relationship matrix is learned such that the score of the correct medical knowledge is higher than the score of the erroneous medical knowledge, and the relationship matrix of the medical relationship is obtained.
- the difference between the correct medical knowledge and the wrong medical knowledge is less than a preset threshold.
- the step of replacing the medical entity in the correct medical knowledge to obtain new medical knowledge as the wrong medical knowledge comprises:
- the medical entity in the correct medical knowledge is replaced with a medical entity other than the medical entity contained in all the correct medical knowledge corresponding to the medical relationship, and new medical knowledge is obtained as the wrong medical knowledge.
- the method before the step of acquiring a relationship matrix of each of the medical relationships, the method further includes:
- N is the number of all medical entities in the medical knowledge base, and x e is a vector of N ⁇ 1 dimension;
- y e is a vector of K ⁇ 1 dimension
- W is a mapping matrix
- W is a vector of K ⁇ N dimensions
- K is a predetermined value
- K is smaller than N.
- the scoring function is:
- r is the medical relationship, e 1, e 2 of the medical entity, r (e 1, e 2 ) of the medical knowledge, Score ( ⁇ ) is a scoring function, M r is the medical relation r The relationship matrix.
- the preset objective function is:
- L is the objective function
- T is the correct set of medical knowledge
- T ' is set to the wrong medical knowledge
- M r is the medical relationship matrix of r.
- the step of acquiring an inference rule according to the relationship matrix of the medical relationship includes:
- multiple medical relationship groups are selected to construct a plurality of inference rules to be verified, wherein each inference rule to be verified corresponds to one medical relationship group, and each medical relationship group includes three medical relationships;
- the step of selecting a part of the to-be-verified inference rule as the finally obtained inference rule according to the similarity includes:
- the preset number of inference rules to be verified is taken as the finally obtained inference rule.
- the three medical relationships in each medical relationship group satisfy the following conditions:
- H p , H q and H r are respectively a set of all the first medical entities in the triples corresponding to the three medical relationships p, q and r in the medical relationship group, L p , L q and L r Each is a collection of all second medical entities in the triples corresponding to the three medical relationships p, q, and r in the medical relationship group.
- the step of calculating a similarity between a product of a relationship matrix of two medical relationships in the medical relationship group and a relationship matrix of another medical relationship comprises:
- a relationship matrix L2 norm of a product of a relationship matrix of two medical relationships in the medical relationship group and another medical relationship is calculated, and the L2 norm is taken as a similarity.
- the present disclosure further provides an inference rule automatic discovery system, which is applied to a medical knowledge base, the medical knowledge base including a plurality of medical knowledge, each of the medical knowledge being two medical entities
- the system includes:
- relationship matrix learner for acquiring a relationship matrix of each of the medical relationships, the relationship matrix reflecting a relationship between vectors of medical entities having the medical relationship;
- An inference rule finder configured to acquire an inference rule according to a relationship matrix of the plurality of the medical relationships, wherein each of the acquired inference rules includes a first medical relationship, a second medical relationship, and a third medical relationship, the reasoning Rules are used to indicate that the third medical relationship can be inferred from the first medical relationship and the second medical relationship.
- the relationship matrix learner is configured to: construct an initial relationship matrix for each medical relationship; and acquire medical knowledge corresponding to the medical relationship in the medical knowledge base as correct medical knowledge Replacing the medical entity in the correct medical knowledge, obtaining new medical knowledge as erroneous medical knowledge; scoring the correct medical knowledge and the erroneous medical knowledge separately using a scoring function; and using a preset objective function for the initial
- the relationship matrix is learned such that the score of the correct medical knowledge is higher than the score of the erroneous medical knowledge, and a relationship matrix of the medical relationship is obtained.
- the difference between the correct medical knowledge and the wrong medical knowledge is less than a preset threshold.
- the relationship matrix learner is configured to replace a medical entity in the correct medical knowledge with a medical entity other than a medical entity included in all correct medical knowledge corresponding to the medical relationship, to obtain a new Medical knowledge as a wrong medical knowledge.
- system further includes:
- An entity vector learner that performs the following operations:
- N is the number of all medical entities in the medical knowledge base, and x e is a vector of N ⁇ 1 dimension;
- y e is a vector of K ⁇ 1 dimension
- W is a mapping matrix
- W is a vector of K ⁇ N dimensions
- K is a predetermined value
- K is smaller than N.
- the scoring function is:
- r is the medical relationship, e 1, e 2 of the medical entity, r (e 1, e 2 ) of the medical knowledge, Score ( ⁇ ) is a scoring function, M r is the medical relation r The relationship matrix.
- the preset objective function is:
- L is the objective function
- T is the correct set of medical knowledge
- T ' is set to the wrong medical knowledge
- M r is the medical relationship matrix of r.
- the inference rule finder is configured to perform operations of selecting a plurality of medical relationship groups from all medical relationships, and constructing a plurality of inference rules to be verified, wherein each inference rule to be verified corresponds to a medical relationship a group, each medical relationship group comprising three medical relationships; calculating a similarity of a product of a relationship matrix of two of the medical relationships in the medical relationship group and a relationship matrix of another medical relationship; and according to the similarity, Select some of the inference rules to be verified as the final inference rule.
- the inference rule finder is configured to sort the similarities corresponding to all the medical relationship groups to obtain a preset number of to-be-verified inference rules with the highest similarity; and the preset number of to be verified The inference rule is used as the final inference rule.
- the three medical relationships in each medical relationship group satisfy the following conditions:
- H p , H q and H r are respectively a set of all the first medical entities in the triples corresponding to the three medical relationships p, q and r in the medical relationship group, L p , L q and L r Each is a collection of all second medical entities in the triples corresponding to the three medical relationships p, q, and r in the medical relationship group.
- the inference rule finder is configured to calculate a relationship matrix L2 norm of a product of a relationship matrix of two medical relationships in the medical relationship group and another medical relationship, and the L2 norm is similar degree.
- a data retrieval method of a medical knowledge base comprising: acquiring a relationship matrix of each medical relationship, wherein the relationship matrix reflects a vector of two medical entities having the medical relationship
- the medical knowledge base includes a plurality of medical knowledge, each of the medical knowledge being a combination of two medical entities and medical relationships of the two medical entities;
- each of the acquired inference rules includes a first medical relationship, a second medical relationship, and a third medical relationship, wherein the inference rule is used to represent a medical relationship and the second medical relationship are capable of inferring the third medical relationship;
- the outputting medical knowledge related to the search term or the search term includes displaying medical knowledge related to the search term or the search term.
- a medical knowledge database comprising:
- a data input device for inputting medical knowledge data
- relationship matrix learner for acquiring a relationship matrix of each of the medical relationships, wherein the relationship matrix reflects a relationship between vectors of medical entities having the medical relationship, the medical knowledge base including a plurality of medical knowledge Each of said medical knowledge is a combination of two medical entities and medical relationships of said two medical entities;
- An inference rule finder configured to acquire an inference rule according to a relationship matrix of the plurality of the medical relationships, wherein each of the acquired inference rules includes a first medical relationship, a second medical relationship, and a third medical relationship, the reasoning a rule for indicating that the third medical relationship can be inferred by the first medical relationship and the second medical relationship;
- a searcher for performing a search based on the inference rule according to the search term or a search formula to obtain medical knowledge related to the search term or the search formula;
- An outputter for outputting medical knowledge related to the search term or the search term is an outputter for outputting medical knowledge related to the search term or the search term.
- the outputter is configured to display medical knowledge related to the search term or search term.
- FIG. 1 is a schematic flowchart of a method for automatically discovering an inference rule according to an embodiment of the present disclosure
- FIG. 2 is a schematic flow chart of a method for acquiring a relationship matrix of a medical relationship according to an embodiment of the present disclosure
- FIG. 3 is a schematic flowchart of a method for automatically discovering an inference rule according to another embodiment of the present disclosure
- FIG. 4 is a structural block diagram of an inference rule automatic discovery system according to an embodiment of the present disclosure.
- FIG. 5 is a structural block diagram of a database according to an embodiment of the present disclosure.
- FIG. 6 is a schematic flowchart diagram of a retrieval method according to an embodiment of the present disclosure.
- the medical relationship is represented as a matrix, and then the operation between the matrices is used to automatically discover the inference rules from the vast amount of medical knowledge, without manually defining the inference rules, saving a lot of manpower and material resources.
- the time cost is reduced; the medical knowledge database and the retrieval method based on the above inference rules enable the searcher to retrieve more complete medical knowledge data.
- the inference rule automatic discovery method and system according to an embodiment of the present disclosure is applied to a medical knowledge base, and an inference rule can be automatically found from a medical knowledge base.
- the medical knowledge base in the embodiments of the present disclosure includes a plurality of medical knowledge, each of which is a combination of two medical entities and one medical relationship. It can be seen that one medical knowledge includes three elements.
- medical knowledge can be represented by a triplet, such as a triplet r(e 1 , e 2 ), where r represents a medical relationship, e 1 and e 2 represents a medical entity.
- treatment species, cancer
- medical relationship is treatment
- psyllids and cancer are medical entities, that is, pserythroid can be used to treat cancer.
- a medical entity in a medical knowledge base may be represented as a vector.
- a medical entity can be represented as a "one-hot" vector.
- the so-called “one-hot” vector means that only the corresponding dimension value is 1, and the rest are 0.
- the N medical entities can be arranged in any order. Assuming that the position of the medical entity "Park Mushroom” is i, then the "one-hot" vector of "Pygmyin” e is (a 1 ,...,a i ,...,a N ), where That is, only the i-th element is 1, and the remaining N-1 are 0.
- the dimension of the "one-hot" vector is equal to the total number of medical entities, each dimension represents a medical entity, and the "one-hot" vector of each medical entity has only its own corresponding dimension of 1, and the remaining dimensions are 0. .
- the "one-hot" vector of each medical entity is 3D, only The corresponding dimension is 1 and the rest is 0.
- the "one-hot" vector of calcium is (1, 0, 0).
- the "one-hot" vector is an N ⁇ 1 dimensional vector, and the number of medical entities in the medical knowledge base is very large, that is, the value of N is very large, if the medical entity adopts "one-hot"
- the vector representation is very large and difficult to handle.
- medical knowledge bases may have different names of medical entities that are actually the same entity, or similar medical entities, such as the medical knowledge base including the medical entities "acetaminophen” and “paracetamol", and both In fact, it is the same entity, it is necessary to use the same or similar vector representation.
- a "one-hot" vector of a high-dimensional space may be mapped and mapped to a vector of a low-dimensional space.
- Vectors in low dimensional space can effectively reduce the dimensions of the vector, and can also represent the same or similar medical entities as the same or similar vectors.
- ⁇ (x) 1/(1+e - x ))
- ⁇ (x) is a mathematical constant.
- the medical knowledge base includes multiple medical relationships, such as "treatment”, “prevention”, “containment”, and the like.
- each of a medical relationship may be expressed as a relationship matrix M r
- M r matrix reflects the relationship between the vectors have a relationship of the medical medical entity relationship. Because the medical entity is reduced from the N-dimensional "one-hot" vector to the K-dimensional vector, that is, all medical entities are mapped to a K-dimensional space (the K-dimensional space is abbreviated as S), and the relational matrix is used. Reflecting the relationship between medical entities, the size of the relationship matrix is KxK, that is, there are K 2 elements in each relation matrix.
- the element M r (i, j) of the i-th row and the j-th column (where 1 ⁇ i, j ⁇ K) reflects the relationship between the i-th dimension and the j-th dimension of the space S.
- the relationship wherein matrix M r value of each element may be learned by the preset target function, the follow-up will be described in detail.
- the inference rules are discovered by operations between relational matrices of medical relationships. Specifically, the product of the relationship matrix of any two medical relationships can be calculated, and the similarity between the obtained product and the relationship matrix of another medical relationship can be calculated. If the resulting product is similar to the relationship matrix of another medical relationship, then the inference rules between the three medical relationships can be obtained.
- FIG. 1 is a schematic flowchart diagram of an automatic discovery method for inference rules according to an embodiment of the present disclosure.
- the inference rule automatic discovery method is applied to a medical knowledge base, where the medical knowledge base includes multiple medical knowledge, each of which is The medical knowledge is a combination of two medical entities and one medical relationship, the method comprising:
- Step 11 Acquire a relationship matrix of each of the medical relationships, the relationship matrix reflecting a relationship between vectors of medical entities having the medical relationship;
- Step 12 Acquire an inference rule according to a relationship matrix of the plurality of the medical relationships, where each of the acquired inference rules includes a first medical relationship, a second medical relationship, and a third medical relationship, where the inference rule is used to represent The third medical relationship can be inferred from the first medical relationship and the second medical relationship.
- the medical relationship is represented as a matrix, and then the operation between the matrices is used to automatically discover the inference rules from the vast amount of medical knowledge, without manually defining the inference rules, saving a lot of manpower and material resources and reducing the time. cost.
- the first medical relationship r 1 , the second medical relationship r 2 and the third medical relationship r 3 satisfy the following conditions:
- the relationship matrix of the medical relationship can be learned by using a preset objective function.
- the preset objective function is to optimize the relationship matrix of the medical relationship based on the scoring of the medical knowledge corresponding to the medical relationship, and the purpose of learning by using the preset objective function is to make the correct medical knowledge score higher than the wrong medicine.
- the scoring of knowledge It can be understood that the medical knowledge in the medical knowledge base is experimentally and manually tested. Therefore, in the embodiment of the present disclosure, the medical knowledge in the medical knowledge base is used as the correct medical knowledge. However, randomly constructed medical knowledge can be considered as wrong medical knowledge. For example, the medical knowledge in the medical knowledge base "treatment (species, cancer)" as the correct medical knowledge, the randomly constructed medical knowledge "treatment (vitamin A, cancer)" as the wrong medical knowledge.
- FIG. 2 is a schematic flowchart diagram of a method for acquiring a relationship matrix of a medical relationship according to an embodiment of the present disclosure, where the method includes:
- Step 111 Construct an initial relationship matrix for each medical relationship
- each element can be any value.
- Step 112 Acquire medical knowledge corresponding to the medical relationship in the medical knowledge base as correct medical knowledge
- treatment for example, the medical knowledge in the medical knowledge base obtained in the medical treatment "treatment” (treatment (species, cancer)", “treatment (acetaminophen, fever)", “treatment (calcium, itching) "Wait.
- Step 113 replace the medical entity in the correct medical knowledge, and obtain new medical knowledge as the wrong medical knowledge;
- the "vitamin A” can be used to replace the element “Pygmyin” in the medical knowledge "Treatment (Park Mushroom, Cancer)", and the wrong medical knowledge "treatment (vitamin A, cancer)" is obtained.
- Step 114 using a scoring function to score the correct medical knowledge and the wrong medical knowledge respectively;
- Step 115 Learning the initial relationship matrix by using a preset objective function, so that the score of the correct medical knowledge is higher than the score of the wrong medical knowledge, and obtaining a relationship matrix of the medical relationship.
- the scoring based on the correct medical knowledge is higher than the scoring principle of the erroneous medical knowledge, and the learning matrix of the medical relationship is learned by using the preset objective function, and the learning method is simple and effective.
- the difference between the correct medical knowledge and the wrong medical knowledge is less than a preset threshold.
- the number of the correct medical knowledge and the wrong medical knowledge employed is the same. That is, when learning the initial relationship matrix, the number of correct medical knowledge and the number of erroneous medical knowledge are the same or substantially the same, thereby ensuring correct learning results.
- the medical knowledge can be scored by using the following scoring function:
- r is the medical relationship
- e 1 , e 2 are the medical entity
- r(e 1 , e 2 ) is the medical knowledge
- Score( ⁇ ) is a scoring function
- the relationship matrix of the medical relationship may be learned by using the following preset objective function:
- L is the objective function
- T is the correct set of medical knowledge
- T ' is set to the wrong medical knowledge
- M r is the medical relationship matrix of r.
- the relationship matrix of the medical relationship is learned by using a random gradient descent method to minimize the preset objective function.
- other optimization algorithms are not excluded.
- a medical knowledge r(e 1 , e 2 ) when constructing the wrong medical knowledge, for example, when constructing the wrong medical knowledge of the medical relationship r, a medical knowledge r(e 1 , e 2 ) may be selected from the correct medical knowledge set T, and fixed.
- a medical entity in r(e 1 , e 2 ) randomly transforms another medical entity, for example, keeping e 1 unchanged, and e 2 is randomly transformed into e′ to construct a wrong medical knowledge set T′.
- the medical entity in the correct medical knowledge corresponding to the medical relationship when the medical entity in the correct medical knowledge corresponding to the medical relationship is replaced, the medical entity in the correct medical knowledge may be randomly transformed into another medical entity to obtain the wrong medicine. Know how.
- the medical knowledge obtained by the random replacement method is not necessarily absolutely wrong, and it is possible to get a correct medical knowledge.
- the correct medicine is replaced with a medical entity other than the medical entity included in the correct medical knowledge corresponding to the medical relationship
- the medical entity in knowledge acquires new medical knowledge as wrong medical knowledge. For example, if all the medical knowledge corresponding to the medical treatment "treatment” does not include the medical entity “golden mushroom”, “ginger”, etc., you can use "golden mushroom”, “ginger”, etc. to replace all medical knowledge corresponding to "treatment”.
- the medical entity that makes the wrong medical knowledge is the probability of correct medical knowledge is reduced.
- the vector of the medical entity is obtained in the following manner:
- N is the number of all medical entities in the medical knowledge base, and x e is a vector of N ⁇ 1 dimension;
- y e is a vector of K ⁇ 1 dimension
- W is a mapping matrix
- W is a vector of K ⁇ N dimensions
- K is a predetermined value
- K is smaller than N.
- the elements in the initial matrix of the mapping matrix W may be random values. It can be seen from the formula of the above-mentioned preset objective function that the learning matrix can be learned at the same time, and the final mapping matrix W can be learned at the same time.
- the vector of the medical entity can be normalized into a unit vector (ie, a vector with a modulus of 1), that is, for any medical entity e,
- 1(
- the vector Also normalized to a unit vector.
- the vector will And entity vector After normalization to the unit vector, Represents the vector with The cos value between the angles.
- the score of the correct medical knowledge is at least 1 greater than the score of the wrong medical knowledge. The larger the value, the better, but the maximum value of cos is 1, so one of the results of optimizing the preset objective function is to bring the score of the correct medical knowledge closer to 1, ie Therefore, for the correct medical knowledge r 1 (e 1 , e 2 ), there is
- any non-zero unit vector has only the same unit vector as itself.
- the cosine value is 1.
- embodiments of the present disclosure can discover inference rules by multiplication between matrices.
- the similarity corresponding to the multiple inference rules may be sorted, and a predetermined number of inference rules with the highest similarity are selected as the final inference rule.
- the step of acquiring an inference rule according to the relationship matrix of the medical relationship includes:
- multiple medical relationship groups are selected to construct a plurality of inference rules to be verified, wherein each inference rule to be verified corresponds to one medical relationship group, and each medical relationship group includes three medical relationships;
- FIG. 3 is a schematic flowchart of a method for automatically discovering an inference rule according to another embodiment of the present disclosure.
- the method for automatically discovering an inference rule is applied to a medical knowledge base, where the medical knowledge base includes multiple medical knowledge, each of which The medical knowledge is a combination of two medical entities and one medical relationship, the method comprising:
- Step 31 Obtain a relationship matrix of each of the medical relationships, the relationship matrix reflecting a relationship between vectors of medical entities having the medical relationship;
- Step 32 Select a plurality of medical relationship groups from all medical relationships, and construct a plurality of inference rules to be verified, wherein each inference rule to be verified corresponds to one medical relationship group, and each medical relationship group includes three medical relationships;
- Step 33 Calculate a similarity between a product of a relationship matrix of two medical relationships in the medical relationship group and a relationship matrix of another medical relationship;
- Step 34 Sort the similarities corresponding to all the medical relationship groups, and obtain a preset number of inferred inference rules with the highest similarity;
- Step 35 The preset number of inference rules to be verified are taken as the finally obtained inference rule.
- the similarities corresponding to all the inference rules are sorted, and the inference rules with the highest similarity are obtained as the finally obtained inference rules, so that the obtained inference rules are more accurate.
- the final inference rule may be selected from a plurality of inference rules to be verified. For example, the inference rule to be verified whose similarity is greater than the preset threshold is selected as the final inference rule.
- three medical relationships may be arbitrarily selected from all medical relationships to construct a to-be-verified inference rule.
- the three medical relationships in each medical relationship group satisfy the following conditions:
- H p , H q and H r are respectively a set of all the first medical entities in the triples corresponding to the three medical relationships p, q and r in the medical relationship group, L p , L q and L r Each is a collection of all second medical entities in the triples corresponding to the three medical relationships p, q, and r in the medical relationship group.
- the step of calculating a similarity between a product of a relationship matrix of two medical relationships in the medical relationship group and a relationship matrix of another medical relationship comprises:
- a relationship matrix L2 norm of a product of a relationship matrix of two medical relationships in the medical relationship group and another medical relationship is calculated, and the L2 norm is taken as a similarity.
- the specific calculation method can be as follows:
- the set of all medical entities e 1 in the triple of the medical relationship r is H r
- the set of all medical entities e 2 is L r .
- the embodiment of the present disclosure calculates the L2 norm by the following calculation formula, and arranges the calculated medical relationship pairs corresponding to the L2 norm in order from small to large:
- 2 where
- an embodiment of the present disclosure further provides an inference rule automatic discovery system, which is applied to a medical knowledge base, where the medical knowledge base includes a plurality of medical knowledge, and each of the medical knowledge is A combination of two medical entities and a medical relationship, the system comprising:
- relationship matrix learner for acquiring a relationship matrix of each of the medical relationships, the relationship matrix reflecting a relationship between vectors of medical entities having the medical relationship;
- An inference rule finder configured to acquire an inference rule according to a relationship matrix of the plurality of the medical relationships, wherein each of the acquired inference rules includes a first medical relationship, a second medical relationship, and a third medical relationship, the reasoning Rules are used to indicate that the third medical relationship can be inferred from the first medical relationship and the second medical relationship.
- the medical relationship is represented as a matrix, and then the operation between the matrices is used to automatically discover the inference rules from the vast amount of medical knowledge, without manually defining the inference rules, saving a lot of manpower and material resources and reducing the time. cost.
- the relationship matrix learner is configured to: construct an initial relationship matrix for each medical relationship; and acquire medical knowledge corresponding to the medical relationship in the medical knowledge base as correct medical knowledge Replacing the medical entity in the correct medical knowledge, obtaining new medical knowledge as erroneous medical knowledge; scoring the correct medical knowledge and the erroneous medical knowledge separately using a scoring function; and using a preset objective function for the initial
- the relationship matrix is learned such that the score of the correct medical knowledge is higher than the score of the erroneous medical knowledge, and a relationship matrix of the medical relationship is obtained.
- the difference between the correct medical knowledge and the wrong medical knowledge is less than a preset threshold.
- the relationship matrix learner is configured to replace a medical entity in the correct medical knowledge with a medical entity other than a medical entity included in all correct medical knowledge corresponding to the medical relationship, to obtain a new Medical knowledge as a wrong medical knowledge.
- the inference rule automatic discovery system further includes:
- An entity vector learner that performs the following operations:
- N is the number of all medical entities in the medical knowledge base, and x e is a vector of N ⁇ 1 dimension;
- y e is a vector of K ⁇ 1 dimension
- W is a mapping matrix
- W is a vector of K ⁇ N dimensions
- K is a predetermined value
- K is smaller than N.
- the above embodiments may be implemented by hardware, software, or a combination of hardware and software.
- the various methods, steps, and functions (or functional units) described in the embodiments of the present application may be implemented by a processor (a processor is a processor in a broad sense, including a CPU, a processing unit, an ASIC, a logic unit, or Programming logic arrays, etc.).
- the processes, methods, and functional modules described in the embodiments of the present application may be implemented by a single processor or by multiple processors.
- a processor as described in the embodiments or claims of the application should be understood to be one or more processors.
- the embodiments described above may be implemented in the form of a software product.
- the computer software product is stored in a non-volatile storage medium and includes a series of instructions for causing a computer device (eg, a personal computer, server or network device such as a router, switch, access point, etc.) to perform embodiments of the present application
- a computer device eg, a personal computer, server or network device such as a router, switch, access point, etc.
- the relational matrix learner, the inference rule finder, and the entity vector learner may be implemented by, for example, the aforementioned processor (including a CPU, a memory, a bus, etc.).
- Computer readable instructions for use in the present application are stored by a plurality of processors in a readable storage medium such as a hard disk, CD-ROM, DVD, optical disk, floppy disk, magnetic tape, RAM, ROM or other suitable storage device.
- at least some of the computer readable instructions may be replaced by specific hardware, such as custom integrated circuits, gate arrays, FPGAs, PLDs, and computers with specific functions
- the scoring function is:
- r is the medical relationship
- e 1 , e 2 are the medical entity
- r(e 1 , e 2 ) is the medical knowledge
- Score( ⁇ ) is a scoring function
- the preset objective function is:
- L is the objective function
- T is the correct set of medical knowledge
- T ' is set to the wrong medical knowledge
- M r is the medical relationship matrix of r.
- the inference rule finder is configured to perform operations of selecting a plurality of medical relationship groups from all medical relationships, and constructing a plurality of inference rules to be verified, wherein each inference rule to be verified corresponds to a medical relationship a group, each medical relationship group includes three medical relationships; calculating a similarity between a product of a relationship matrix of two medical relationships in the medical relationship group and a relationship matrix of another medical relationship; selecting a portion according to the similarity The inference rule to be verified is used as the final inference rule.
- the inference rule finder is configured to: sort the similarities corresponding to all the medical relationship groups, and obtain a preset number of to-be-verified inference rules with the highest similarity; The number of inference rules to be verified is used as the final inference rule.
- the three medical relationships in each medical relationship group satisfy the following conditions:
- H p , H q and H r are respectively a set of all the first medical entities in the triples corresponding to the three medical relationships p, q and r in the medical relationship group, L p , L q and L r Each is a collection of all second medical entities in the triples corresponding to the three medical relationships p, q, and r in the medical relationship group.
- the inference rule finder is configured to calculate a relationship matrix L2 norm of a product of a relationship matrix of two medical relationships in the medical relationship group and another medical relationship, and the L2 norm is similar degree.
- Embodiments of the present disclosure also provide a data retrieval method of a medical knowledge base, comprising: acquiring a relationship matrix of each medical relationship, wherein the relationship matrix reflects between vectors of two medical entities having the medical relationship a relationship, the medical knowledge base comprising a plurality of medical knowledge, each of the medical knowledge being a combination of two medical entities and medical relationships of the two medical entities; acquiring reasoning according to a relationship matrix of the plurality of the medical relationships a rule, wherein each of the inference rules obtained includes a first medical relationship, a second medical relationship, and a third medical relationship, the inference rule being used to indicate that the first medical relationship and the second medical relationship can be inferred The third medical relationship; inputting a search term or a search formula to perform a search; performing a search based on the search term or the search formula based on the inference rule to obtain medical knowledge related to the search term or the search formula; and outputting the search term Or search-related medical knowledge.
- a schematic diagram of the specific process of the method is shown in FIG
- An embodiment of the present disclosure further provides a medical knowledge database, including: a data inputter for inputting medical knowledge data; a relationship matrix learner for acquiring a relationship matrix of each of the medical relationships, wherein the relationship matrix Reflecting a relationship between vectors of medical entities having the medical relationship, the medical knowledge base comprising a plurality of medical knowledge, each of the medical knowledge being a combination of two medical entities and medical relationships of the two medical entities
- An inference rule finder configured to acquire an inference rule according to a relationship matrix of the plurality of the medical relationships, wherein each of the acquired inference rules includes a first medical relationship, a second medical relationship, and a third medical relationship, The inference rule is for indicating that the third medical relationship can be inferred by the first medical relationship and the second medical relationship; an inputter for inputting a search term or a search term; and a retriever for searching according to the The word or the search term is searched based on the inference rule to obtain medical knowledge related to the search term or the search term; and the output device Output for
- the data input device may be, for example, a network input, a USB storage device, an optical disk, or other storage device;
- the input device may be, for example, a keyboard, a mouse, a camera, a scanner, a light pen, a voice input device. a handwriting tablet and a touch screen, etc.;
- the retriever may be, for example, a searcher commonly found in the medical technology field;
- the output device may be, for example, a display, a printer, a plotter, an image output system, a voice output system, a magnetic recording device, etc. .
- the medical relationship is represented as a matrix, and then the operation between the matrices is used to automatically discover the inference rules from the vast amount of medical knowledge, without manually defining the inference rules, saving a lot of manpower and material resources and reducing the time. Cost; a medical knowledge database and a retrieval method based on the above inference rules enable the searcher to retrieve more complete medical knowledge data.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Public Health (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Epidemiology (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Biomedical Technology (AREA)
- Pathology (AREA)
- Fuzzy Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Complex Calculations (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
一种推理规则自动发现方法及系统、医学知识数据库和检索方法。该推理规则自动发现方法包括:获取每一个所述医学关系的关系矩阵,所述关系矩阵反映具有所述医学关系的医学实体的向量之间的关系(11);根据多个所述医学关系的关系矩阵,获取推理规则,获取的每一个推理规则包括第一医学关系、第二医学关系和第三医学关系,所述推理规则用于表示由所述第一医学关系和所述第二医学关系能够推理出所述第三医学关系(12)。该方法和系统能够自动构件医学知识库中的推理规则,不需要人工定义,节省了人力物力,降低了时间成本;同时方便了检索。
Description
相关申请的交叉参考
本申请主张在2017年6月28日在中国提交的中国专利申请号No.201710507636.9的优先权,其全部内容通过引用包含于此。
本公开涉及医学知识库数据挖掘技术领域,尤其涉及一种推理规则自动发现方法及系统、医学知识数据库及检索方法。
当今,生物医学文献的数量以指数级的速度增长。海量的数据给研究者们带来丰富信息的同时,也让研究者们难以发现不同文献之间的关联信息,从而失去发现一些重要知识的机会。例如,在生物医学领域,生物医学文献数据库MEDLINE中的论文已经超过二千万,同时每年也有几百万篇的论文加入到数据库中。阅读如此海量的文献对于医学研究者来说是相当困难的。因此,从医学文献中自动发现知识的方法受到广泛关注。
发明内容
本公开提供一种推理规则自动发现方法及系统。还提供一种基于上述推理规则建立的医学知识数据库以及检索方法,使得检索人能够检索出更加完整的医学知识数据。
根据本公开的一个方面,本公开提供一种推理规则自动发现方法,其应用于医学知识库,所述医学知识库中包括多个医学知识,每一个所述医学知识为两个医学实体和一个医学关系的组合,所述方法包括:
获取每一个所述医学关系的关系矩阵,所述关系矩阵反映具有所述医学关系的医学实体的向量之间的关系;
根据多个所述医学关系的关系矩阵,获取所述推理规则,获取的每一个推理规则中包括第一医学关系、第二医学关系和第三医学关系,所述推理规 则用于表示由所述第一医学关系和所述第二医学关系能够推理出所述第三医学关系。
可选地,所述获取每一个所述医学关系的关系矩阵的步骤包括:
针对每一个医学关系,构建一个初始的关系矩阵;
获取所述医学知识库中的所述医学关系对应的医学知识作为正确医学知识;
更换所述正确医学知识中的医学实体以得到新的医学知识作为错误医学知识;
采用打分函数为所述正确医学知识和错误医学知识分别打分;
采用预设目标函数,对所述初始的关系矩阵进行学习,使得所述正确医学知识的打分高于所述错误医学知识的打分,得到所述医学关系的关系矩阵。
可选地,对所述初始的关系矩阵进行学习时,采用的所述正确医学知识和错误医学知识的个数差小于预设阈值。
可选地,所述更换所述正确医学知识中的医学实体,得到新的医学知识作为错误医学知识的步骤包括:
采用除了包含在与所述医学关系对应的所有正确医学知识的医学实体之外的医学实体,更换所述正确医学知识中的医学实体,得到新的医学知识作为错误医学知识。
可选地,所述获取每一个所述医学关系的关系矩阵的步骤之前,还包括:
获取每一个所述医学实体的独热码向量:
N为所述医学知识库中的所有医学实体的个数,x
e为N×1维的向量;
根据所述每一个所述医学实体的独热码向量,获取每一个所述医学实体的向量:
y
e=σ(Wx
e),σ(x)=1/(1+e
x)
y
e为K×1维的向量,W为映射矩阵,W为K×N维的向量,K为预定的数值,K小于N。
可选地,所述打分函数为:
其中,r为所述医学关系,e
1,e
2为所述医学实体,r(e
1,e
2)为所述医学知识,Score(·)为打分函数,M
r为所述医学关系r的关系矩阵。
可选地,所述预设目标函数为:
其中,L为目标函数,T为所述正确医学知识集合,T′为所述错误医学知识集合,M
r为所述医学关系r的关系矩阵。
可选地,所述根据所述医学关系的关系矩阵,获取推理规则的步骤包括:
从所有医学关系中,选择多个医学关系组,构建多个待验证推理规则,其中,每一个待验证推理规则对应一个医学关系组,每一个医学关系组包括三个医学关系;
计算所述医学关系组中的两个医学关系的关系矩阵的乘积与另一个医学关系的关系矩阵的相似度;
根据所述相似度,选择部分待验证推理规则作为最终得到的推理规则。
可选地,所述根据所述相似度,选择部分待验证推理规则作为最终得到的推理规则的步骤包括:
将所有所述医学关系组对应的相似度进行排序,得到相似度最大的预设数目个待验证推理规则;以及
将所述预设数目个待验证推理规则作为最终得到的推理规则。
可选地,每一个医学关系组中的三个医学关系满足如下条件:
其中,H
p、H
q和H
r分别是医学关系组中的三个医学关系p、q和r对应的三元组中的所有第一个医学实体的集合,L
p、L
q和L
r分别是医学关系组中的三 个医学关系p、q和r对应的三元组中的所有第二个医学实体的集合。
可选地,所述计算所述医学关系组中的两个医学关系的关系矩阵的乘积与另一个医学关系的关系矩阵的相似度的步骤包括:
计算所述医学关系组中的两个医学关系的关系矩阵的乘积与另一个医学关系的关系矩阵L2范数,将所述L2范数作为相似度。
根据本公开的另一个方面,本公开还提供一种推理规则自动发现系统,其应用于医学知识库,所述医学知识库中包括多个医学知识,每一个所述医学知识为两个医学实体和一个医学关系的组合,所述系统包括:
关系矩阵学习器,用于获取每一个所述医学关系的关系矩阵,所述关系矩阵反映具有所述医学关系的医学实体的向量之间的关系;
推理规则发现器,用于根据多个所述医学关系的关系矩阵,获取推理规则,其中,获取的每一个推理规则中包括第一医学关系、第二医学关系和第三医学关系,所述推理规则用于表示由所述第一医学关系和所述第二医学关系能够推理出所述第三医学关系。
可选地,所述关系矩阵学习器,用于执行如下操作:针对每一医学关系,构建一个初始的关系矩阵;获取所述医学知识库中的所述医学关系对应的医学知识作为正确医学知识;更换所述正确医学知识中的医学实体,得到新的医学知识作为错误医学知识;采用打分函数为所述正确医学知识和错误医学知识分别打分;以及采用预设目标函数,对所述初始的关系矩阵进行学习,使得所述正确医学知识的打分高于所述错误医学知识的打分,得到所述医学关系的关系矩阵。
可选地,对所述初始的关系矩阵进行学习时,采用的所述正确医学知识和错误医学知识的个数差小于预设阈值。
可选地,所述关系矩阵学习器,用于采用除了包含在与所述医学关系对应的所有正确医学知识的医学实体之外的医学实体,更换所述正确医学知识中的医学实体,得到新的医学知识作为错误医学知识。
可选地,所述系统还包括:
实体向量学习器,用于执行如下操作:
获取每一所述医学实体的独热码向量:
N为所述医学知识库中的所有医学实体的个数,x
e为N×1维的向量;
根据所述每一所述医学实体的独热码向量,获取每一所述医学实体的向量:
y
e=σ(Wx
e),σ(x)=1/(1+e
x)
y
e为K×1维的向量,W为映射矩阵,W为K×N维的向量,K为预定的数值,K小于N。
可选地,所述打分函数为:
其中,r为所述医学关系,e
1,e
2为所述医学实体,r(e
1,e
2)为所述医学知识,Score(·)为打分函数,M
r为所述医学关系r的关系矩阵。
可选地,所述预设目标函数为:
其中,L为目标函数,T为所述正确医学知识集合,T′为所述错误医学知识集合,M
r为所述医学关系r的关系矩阵。
可选地,所述推理规则发现器,用于执行如下操作:从所有医学关系中,选择多个医学关系组,构建多个待验证推理规则,其中,每一个待验证推理规则对应一个医学关系组,每一个医学关系组包括三个医学关系;计算所述医学关系组中的其中两个医学关系的关系矩阵的乘积与另一个医学关系的关系矩阵的相似度;以及根据所述相似度,选择部分待验证推理规则作为最终得到的推理规则。
可选地,所述推理规则发现器,用于将所有所述医学关系组对应的相似度进行排序,得到相似度最大的预设数目个待验证推理规则;将所述预设数目个待验证推理规则作为最终得到的推理规则。
可选地,每一个医学关系组中的三个医学关系满足如下条件:
其中,H
p、H
q和H
r分别是医学关系组中的三个医学关系p、q和r对应的三元组中的所有第一个医学实体的集合,L
p、L
q和L
r分别是医学关系组中的三个医学关系p、q和r对应的三元组中的所有第二个医学实体的集合。
可选地,所述推理规则发现器,用于计算所述医学关系组中的两个医学关系的关系矩阵的乘积与另一个医学关系的关系矩阵L2范数,将所述L2范数作为相似度。
根据本公开的再一个方面,提供一种医学知识库的数据检索方法,包括:获取每一个医学关系的关系矩阵,其中,所述关系矩阵反映具有所述医学关系的两个医学实体的向量之间的关系,所述医学知识库包括多个医学知识,每一个所述医学知识为两个医学实体以及所述两个医学实体的医学关系的组合;
根据多个所述医学关系的关系矩阵,获取推理规则,其中,获取的每一个推理规则包括第一医学关系、第二医学关系和第三医学关系,所述推理规则用于表示由所述第一医学关系和所述第二医学关系能够推理出所述第三医学关系;
输入检索词或检索式进行检索;
根据所述检索词或检索式基于所述推理规则进行检索获得与该检索词或检索式相关的医学知识;以及
输出与该检索词或检索式相关的医学知识。
可选地,所述输出与该检索词或检索式相关的医学知识包括显示与该检索词或检索式相关的医学知识。
根据本公开的另一个方面,提供一种医学知识数据库,包括:
数据输入器,用于输入医学知识数据;
关系矩阵学习器,用于获取每一个所述医学关系的关系矩阵,其中,所述关系矩阵反映具有所述医学关系的医学实体的向量之间的关系,所述医学 知识库包括多个医学知识,每一个所述医学知识为两个医学实体以及所述两个医学实体的医学关系的组合;
推理规则发现器,用于根据多个所述医学关系的关系矩阵,获取推理规则,其中,获取的每一个推理规则中包括第一医学关系、第二医学关系和第三医学关系,所述推理规则用于表示由所述第一医学关系和所述第二医学关系能够推理出所述第三医学关系;
输入器,用于输入检索词或检索式;
检索器,用于根据所述检索词或检索式基于所述推理规则进行检索以获得与该检索词或检索式相关的医学知识;以及
输出器,用于输出与该检索词或检索式相关的医学知识。
可选地,所述输出器用于显示与该检索词或检索式相关的医学知识。
为了更清楚地说明本公开实施例的技术方案,下面将对本公开实施例的描述中所需要使用的附图作简单地介绍。显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本公开一实施例的推理规则自动发现方法的流程示意图;
图2为本公开实施例的医学关系的关系矩阵的获取方法的流程示意图;
图3为本公开另一实施例的推理规则自动发现方法的流程示意图;
图4为本公开实施例的推理规则自动发现系统的结构框图;
图5为本公开实施例的数据库的结构框图;
图6为本公开实施例的检索方法的流程示意图。
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例的附图,对本公开实施例的技术方案进行清楚、完整地描述。显然,所描述的实施例是本公开的一部分实施例,而不是全部的实施例。基于所描述的本公开的实施例,本领域普通技术人员所获得的所有其他实施例,都属 于本公开保护的范围。
传统的医学知识自动发现方法需要人工定义推理规则。例如,人工定义推理规则“包含(A,B)^治疗(B,C)=>预防(A,C)”。基于事实“金针菇含有朴菇素”和“朴菇素可以治疗癌症”,依据上面定义的推理规则,可以得出“金针菇可以预防癌症”。但是,目前医学领域包含的信息巨大,人工定义推理规则需要大量的时间和人力财力,也不利于检索出完整的医学知识数据。而在本公开的实施例中,通过将医学关系表示成矩阵,然后通过矩阵之间的运算,从海量的医学知识中自动发现推理规则,不需要人工定义推理规则,节省了大量的人力物力,降低了时间成本;基于上述推理规则建立的医学知识数据库以及检索方法,使得检索人能够检索出更加完整的医学知识数据。
根据本公开实施例的推理规则自动发现方法及系统应用于医学知识库,可以从医学知识库中自动发现推理规则。
本公开实施例中的医学知识库包括多个医学知识,每一个所述医学知识为两个医学实体和一个医学关系的组合。可见,一个医学知识包括三个元素,本公开实施例中,可以采用三元组的方式表示医学知识,例如三元组r(e
1,e
2),其中,r表示医学关系,e
1和e
2表示医学实体。举例来说,治疗(朴菇素,癌症),医学关系是治疗,朴菇素和癌症是医学实体,即,朴菇素可以用于治疗癌症。
本公开实施例中,可以将医学知识库中的医学实体表示成向量。
例如,可以将医学实体表示成“one-hot(独热码)”向量。所谓“one-hot”向量是指只有相应的维度值为1,其余都为0。假设医学知识库中一共有N个医学实体,可以按照任意顺序排列这N个医学实体,假设医学实体“朴菇素”的位置为i,则“朴菇素”的“one-hot”向量x
e为(a
1,...,a
i,...,a
N),其中
即只有第i个元素为1,其余N-1个为0。可以看出,“one-hot”向量的维度和医学实体的总数相等,每一维度表示一个医学实体,每个医学实体的“one-hot”向量只有自己对应的维度为1,其余维度为0。举例来说,假设一共有三个医学实体{钙,朴菇素,锌},对应的排列顺序为1,2,3,则 每个医学实体的“one-hot”向量都是3维,只有自己对应的维度为1,其余为0,如钙的“one-hot”向量为(1,0,0)。
从上面的描述可以看出,“one-hot”向量是N×1维的向量,医学知识库中的医学实体的数量非常庞大,即N的数值非常大,如果医学实体采用“one-hot”向量表示,则数量非常之大,难以处理。另外,由于医学知识库中可能存在不同名称的医学实体实际上是同一实体,或者,相似的医学实体的情况,例如医学知识库中包括医学实体“乙酰氨基酚”和“扑热息痛”,而两者实际上是同一实体,有必要将两者采用相同或相似的向量表示。
本公开实施例中,可以对高维空间的“one-hot”向量进行映射,将其映射为低维空间的向量。低维空间的向量可以有效降低向量的维度,且还可以将相同或相似的医学实体表示成相同或相似的向量。
例如,假设映射矩阵W为KxN维的向量,K为小于N的正整数,则医学实体e的向量y
e可以表示为:y
e=σ(Wx
e),其中,σ为sigmoid函数,x
e为医学实体e的“one-hot”向量,σ(x)=1/(1+e
-x)),σ(x)的公式中e为数学常数。从上述y
e的计算公式可以看出,由于W为KxN维的向量,x
e为Nx1的向量,因而,W*x
e为K×1维的向量,y
e为K×1维的向量。由于K小于N,因而,映射后的向量y
e的维数小于“one-hot”向量的维数。K的数值可以根据需要设置,通常远小于N。对于映射矩阵W,可以通过预设目标函数学习其中每个元素的值,后续将详细介绍。
医学知识库中包括多个医学关系,例如,“治疗”,“预防”,“包含”等。本公开实施例中,每一个医学关系可以表示成一个关系矩阵M
r,所述关系矩阵M
r反映具有所述医学关系的医学实体的向量之间的关系。因为将医学实体从N维的“one-hot”向量降维到K维的向量,即所有医学实体都被映射到一个K维空间(该K维空间简记为S),而关系矩阵用来反映医学实体之间的关系,所以关系矩阵的大小为KxK,即每个关系矩阵中有K
2个元素。第i行第j列的元素M
r(i,j)(其中1≤i,j≤K)反映了空间S的第i维度和第j维度的相互关系。本公开实施例中,关系矩阵M
r可以通过预设目标函数学习其中每个元素的值,后续将详细介绍。
本公开实施例中,通过医学关系的关系矩阵之间的运算,来发现推理规 则。具体的,可以通过计算任意两个医学关系的关系矩阵的乘积,并计算得到的乘积与另一医学关系的关系矩阵的相似度。如果得到的乘积与另一医学关系的关系矩阵相似,则可以得到三个医学关系之间的推理规则。
请参考图1,图1为本公开一实施例的推理规则自动发现方法的流程示意图,该推理规则自动发现方法应用于医学知识库,所述医学知识库中包括多个医学知识,每一个所述医学知识为两个医学实体和一个医学关系的组合,所述方法包括:
步骤11:获取每一个所述医学关系的关系矩阵,所述关系矩阵反映具有所述医学关系的医学实体的向量之间的关系;
步骤12:根据多个所述医学关系的关系矩阵,获取推理规则,其中,获取的每一推理规则中包括第一医学关系、第二医学关系和第三医学关系,所述推理规则用于表示由所述第一医学关系和所述第二医学关系能够推理出所述第三医学关系。
得到的推理规则为:第一医学关系^第二医学关系=>第三医学关系,^为逻辑与符号。
本公开实施例中,通过将医学关系表示成矩阵,然后通过矩阵之间的运算,从海量的医学知识中自动发现推理规则,不需要人工定义推理规则,节省了大量的人力物力,降低了时间成本。
可选地,第一医学关系r
1、第二医学关系r
2和第三医学关系r
3满足如下条件:
其中,
和
分别是第一医学关系r
1、第二医学关系r
2和第三医学关系r
3对应的三元组中的所有第一个医学实体的集合,
和
分别是第一医学关系r
1、第二医学关系r
2和第三医学关系r
3对应的三元组中的所有第二个医学实体的集合。
下面对如何确定医学关系的关系矩阵进行详细说明。
上述实施例中提到,可以采用预设目标函数对医学关系的关系矩阵进行学习。
本公开实施例中,预设目标函数是基于医学关系对应的医学知识的打分对医学关系的关系矩阵进行优化,采用预设目标函数学习的目的是使得正确的医学知识的打分高于错误的医学知识的打分。可以理解的是,医学知识库中的医学知识是经过实验和人工检验的,因此,本公开实施例中,使用医学知识库中的医学知识作为正确的医学知识。而,随机构建的医学知识可以认为是错误的医学知识。举例来说,医学知识库中的医学知识“治疗(朴菇素,癌症)”作为正确的医学知识,随机构建的医学知识“治疗(维生素A,癌症)”作为错误的医学知识。
请参考图2,图2为本公开实施例的医学关系的关系矩阵的获取方法的流程示意图,该方法包括:
步骤111:针对每一个医学关系,构建一个初始的关系矩阵;
初始的关系矩阵中,每一个元素可以是任意的数值。
步骤112:获取所述医学知识库中的所述医学关系对应的医学知识作为正确医学知识;
例如,获取到的医学知识库中医学关系“治疗”对应的医学知识“治疗(朴菇素,癌症)”,“治疗(乙酰氨基酚,发烧)”,“治疗(炉甘石,止痒)”等。
步骤113:更换所述正确医学知识中的医学实体,得到新的医学知识作为错误医学知识;
举例来说,可以采用“维生素A”更换医学知识“治疗(朴菇素,癌症)”中的元素“朴菇素”,得到错误的医学知识“治疗(维生素A,癌症)”。
步骤114:采用打分函数为所述正确医学知识和错误医学知识分别打分;
步骤115:采用预设目标函数,对所述初始的关系矩阵进行学习,使得所述正确医学知识的打分高于所述错误医学知识的打分,得到所述医学关系的关系矩阵。
本公开实施例中,基于正确的医学知识的打分要高于错误的医学知识的打分的原则,采用预设目标函数对医学关系的关系矩阵进行学习,学习方法 简单,有效。
可选地,对所述初始的关系矩阵进行学习时,采用的所述正确医学知识和错误医学知识的个数差小于预设阈值。进一步可选地,采用的所述正确医学知识和错误医学知识的个数相同。即,对所述初始的关系矩阵进行学习时,采用正确医学知识的个数和错误医学知识的个数相同或基本相同,从而保证正确的学习结果。
本公开实施例中,可以采用下述打分函数对医学知识进行打分:
其中,r为所述医学关系,e
1,e
2为所述医学实体,r(e
1,e
2)为所述医学知识,Score(·)为打分函数。
当然,在本公开的其他一些实施例中,也不排除采用其他类型的打分函数的可能。
本公开实施例中,可以采用如下预设目标函数对医学关系的关系矩阵进行学习:
其中,L为目标函数,T为所述正确医学知识集合,T′为所述错误医学知识集合,M
r为所述医学关系r的关系矩阵。
本公开实施例中,是采用随机梯度下降方法最小化预设目标函数的方式,学习医学关系的关系矩阵。当然,在本公开的其他一些实施例中,也不排除采用其他优化算法。
本公开实施例中,在构建错误的医学知识时,例如在构建医学关系r的错误的医学知识时,可以从正确的医学知识集合T中选取一个医学知识r(e
1,e
2),固定r(e
1,e
2)中的一个医学实体,随机变换另一个医学实体,例如保持e
1不变,将e
2随机变换为e′,来构建错误的医学知识集合T′。
也就是说,本公开实施例中,在更换所述医学关系对应的正确医学知识中的医学实体时,可以将所述正确医学知识中的医学实体随机变换成另一个医学实体,得到错误的医学知识。
可以理解的是,采用随机替换的方法得到的医学知识,也不一定绝对是 错误的,有可能会得到一个正确的医学知识。例如将正确医学知识中“治疗(朴菇素,癌症)”中的朴菇素替换成阿糖胞苷,得到医学知识“治疗(阿糖胞苷,癌症)”,这有可能是正确的。然而,本公开实施例中所说的错误医学知识是指统计意义上的,即绝大部分随机得到的医学知识都是错误的。举例说明,假设医学知识库中存在100000个医学实体,其中用来治疗癌症的医学实体其实很少,例如由1000个医学实体,随机替换后只有1000/100000=1%的概率是对的,99%的概率是错误的。如果我们替换很多次,例如10000次,依据大数定律则这10000次替换的结果中99%的都是错误的,即使有很小一部分是对的也不会造成很大影响。
可选地,在更换与所述医学关系对应的正确医学知识中的医学实体时,采用除了包含在与所述医学关系对应的正确医学知识的医学实体之外的医学实体,更换所述正确医学知识中的医学实体,得到新的医学知识作为错误医学知识。举例来说,医学关系“治疗”所对应的所有医学知识中,不包括医学实体“金针菇”“生姜”等,则可以采用“金针菇”“生姜”等更换“治疗”所对应的所有医学知识中的医学实体,使得得到的错误的医学知识是正确医学知识的概率减少。
上述实施例中提到,在进行医学关系的关系矩阵学习之前,还需要获取医学实体的向量,本公开实施例中采用以下方式得到医学实体的向量:
获取每一个所述医学实体的独热码向量:
N为所述医学知识库中的所有医学实体的个数,x
e为N×1维的向量;
根据所述每一个所述医学实体的独热码向量,获取每一个所述医学实体的向量:
y
e=σ(Wx
e),σ(x)=1/(1+e
x)
y
e为K×1维的向量,W为映射矩阵,W为K×N维的向量,K为预定的数值,K小于N。
映射矩阵W的初始矩阵中的元素可以是随机的数值。从上述预设目标函数的公式可以看出,在学习得到关系矩阵,也可以同时学习到最终的映射矩 阵W。
下面对本公开实施例中自动发现推理规则的原理进行说明。
因为将向量
和实体向量
归一化为单位向量后,
表示的就是向量
和
之间夹角的cos值。按照前面的预设目标函数,正确的医学知识的打分至少要比错误的医学知识的打分要大1,此时
的值越大越好,但cos的最大值为1,所以优化预设目标函数的一个结果就是会使正确的医学知识的打分趋近1,即
因此,对于正确的医学知识r
1(e
1,e
2),则有
基于上述分析,本公开实施例可以通过矩阵之间的乘法来发现推理规则。
本方实施例中,如果计算得到多个推理规则,也可以对多个推理规则对应的相似度进行排序,选取相似度最大的预设数目个推理规则,作为最终的 推理规则。
可选地,所述根据所述医学关系的关系矩阵,获取推理规则的步骤包括:
从所有医学关系中,选择多个医学关系组,构建多个待验证推理规则,其中,每一个待验证推理规则对应一个医学关系组,每一个医学关系组包括三个医学关系;
计算所述医学关系组中的其中两个医学关系的关系矩阵的乘积与另一个医学关系的关系矩阵的相似度;
根据所述相似度,选择部分待验证推理规则作为最终得到的推理规则。
请参考图3,图3为本公开另一实施例的推理规则自动发现方法的流程示意图,该推理规则自动发现方法应用于医学知识库,所述医学知识库中包括多个医学知识,每一个所述医学知识为两个医学实体和一个医学关系的组合,所述方法包括:
步骤31:获取每一个所述医学关系的关系矩阵,所述关系矩阵反映具有所述医学关系的医学实体的向量之间的关系;
步骤32:从所有医学关系中,选择多个医学关系组,构建多个待验证推理规则,其中,每一待验证推理规则对应一个医学关系组,每一个医学关系组包括三个医学关系;
步骤33:计算所述医学关系组中的其中两个医学关系的关系矩阵的乘积与另一个医学关系的关系矩阵的相似度;
步骤34:将所有所述医学关系组对应的相似度进行排序,得到相似度最大的预设数目个待验证推理规则;
步骤35:将所述预设数目个待验证推理规则作为最终得到的推理规则。
本公开实施例中,对所有推理规则对应的相似度进行排序,得到相似度最大的几个推理规则作为最终得到的推理规则,使得得到的推理规则更准确。
除了根据相似度进行排序之外,也可以采用其他方法从多个待验证推理规则中选择最终的推理规则。例如,选取相似度大于预设阈值的待验证推理规则作为最终的推理规则。
本公开实施例中,可以从所有医学关系中,任意选择三个医学关系构建一待验证推理规则。
优选地,每一医学关系组中的三个医学关系满足如下条件:
其中,H
p、H
q和H
r分别是医学关系组中的三个医学关系p、q和r对应的三元组中的所有第一个医学实体的集合,L
p、L
q和L
r分别是医学关系组中的三个医学关系p、q和r对应的三元组中的所有第二个医学实体的集合。
可选地,所述计算所述医学关系组中的两个医学关系的关系矩阵的乘积与另一个医学关系的关系矩阵的相似度的步骤包括:
计算所述医学关系组中的两个医学关系的关系矩阵的乘积与另一个医学关系的关系矩阵L2范数,将所述L2范数作为相似度。
具体的计算方法可以如下:
对于给定医学关系r的三元组r(e
1,e
2),设医学关系r的三元组中所有医学实体e
1组成的集合为H
r,所有医学实体e
2组成的集合为L
r。对于关系r,本公开实施例中,从满足如下条件的医学关系对(p,q)中找出可能的推理规则p∧q=>r;
对于满足上述条件的所有医学关系对(p,q),本公开实施例通过如下计算公式计算L2范数,并按照从小到大的顺序排列计算出的L2范数对应的医学关系对:||M
pM
q-M
r||
2,其中||.||
2是矩阵的L2范数。
选取前预设数目个L2范数最大的医学关系对(p,q),得出推理规则p∧q=>r。
基于同一发明构思,请参考图4,本公开实施例还提供一种推理规则自动发现系统,其应用于医学知识库,所述医学知识库中包括多个医学知识, 每一个所述医学知识为两个医学实体和一个医学关系的组合,所述系统包括:
关系矩阵学习器,用于获取每一个所述医学关系的关系矩阵,所述关系矩阵反映具有所述医学关系的医学实体的向量之间的关系;
推理规则发现器,用于根据多个所述医学关系的关系矩阵,获取推理规则,其中,获取的每一推理规则中包括第一医学关系、第二医学关系和第三医学关系,所述推理规则用于表示由所述第一医学关系和所述第二医学关系能够推理出所述第三医学关系。
本公开实施例中,通过将医学关系表示成矩阵,然后通过矩阵之间的运算,从海量的医学知识中自动发现推理规则,不需要人工定义推理规则,节省了大量的人力物力,降低了时间成本。
可选地,所述关系矩阵学习器,用于执行如下操作:针对每一医学关系,构建一个初始的关系矩阵;获取所述医学知识库中的所述医学关系对应的医学知识作为正确医学知识;更换所述正确医学知识中的医学实体,得到新的医学知识作为错误医学知识;采用打分函数为所述正确医学知识和错误医学知识分别打分;以及采用预设目标函数,对所述初始的关系矩阵进行学习,使得所述正确医学知识的打分高于所述错误医学知识的打分,得到所述医学关系的关系矩阵。
可选地,对所述初始的关系矩阵进行学习时,采用的所述正确医学知识和错误医学知识的个数差小于预设阈值。
可选地,所述关系矩阵学习器,用于采用除了包含在与所述医学关系对应的所有正确医学知识的医学实体之外的医学实体,更换所述正确医学知识中的医学实体,得到新的医学知识作为错误医学知识。
可选地,所述推理规则自动发现系统还包括:
实体向量学习器,用于执行如下操作:
获取每一个所述医学实体的独热码向量:
N为所述医学知识库中的所有医学实体的个数,x
e为N×1维的向量;
根据所述每一所述医学实体的独热码向量,获取每一所述医学实体的向 量:
y
e=σ(Wx
e),σ(x)=1/(1+e
x)
y
e为K×1维的向量,W为映射矩阵,W为K×N维的向量,K为预定的数值,K小于N。
以上实施例可以由硬件、软件或者硬件和软件的结合实现。例如,本申请实施例中描述的各种方法、步骤和功能器(或功能单元)可以由处理器实现(处理器是指广义上的处理器,包括CPU,处理单元,ASIC,逻辑单元或者可编程逻辑阵列,等等)。本申请实施例中描述的过程、方法和功能模块可以由一个单独的处理器实现也可以分别由多个处理器实现。本申请实施例或权利要求中所述的处理器应当理解为一个或者多个处理器。此外,以上描述的实施例可以以软件产品的方式实现。该计算机软件产品存储在非易失性存储介质中并包括一系列指令用于使得计算机设备(例如个人计算机,服务器或者网络设备如路由器、交换机、接入点、等等)来执行本申请实施例中所描述的方法。在本公开的实施例中,所述关系矩阵学习器、推理规则发现器和实体向量学习器例如可以由上述的处理器(包括CPU、存储器和总线等)来实现。本申请中使用的计算机可读指令由多个处理器存储在可读存储介质中,例如硬盘、CD-ROM、DVD、光盘、软盘、磁带、RAM、ROM或其它合适的存储设备。或者,至少部分计算机可读指令可以由具体硬件替换,例如,定制集成线路、门阵列、FPGA、PLD和具体功能的计算机等等。
可选地,所述打分函数为:
其中,r为所述医学关系,e
1,e
2为所述医学实体,r(e
1,e
2)为所述医学知识,Score(·)为打分函数。
可选地,所述预设目标函数为:
其中,L为目标函数,T为所述正确医学知识集合,T′为所述错误医学知识集合,M
r为所述医学关系r的关系矩阵。
可选地,所述推理规则发现器,用于执行如下操作:从所有医学关系中,选择多个医学关系组,构建多个待验证推理规则,其中,每一个待验证推理规则对应一个医学关系组,每一个医学关系组包括三个医学关系;计算所述医学关系组中的两个医学关系的关系矩阵的乘积与另一个医学关系的关系矩阵的相似度;根据所述相似度,选择部分待验证推理规则作为最终得到的推理规则。
可选地,所述推理规则发现器,用于执行如下操作:将所有所述医学关系组对应的相似度进行排序,得到相似度最大的预设数目个待验证推理规则;将所述预设数目个待验证推理规则作为最终得到的推理规则。
可选地,每一个医学关系组中的三个医学关系满足如下条件:
其中,H
p、H
q和H
r分别是医学关系组中的三个医学关系p、q和r对应的三元组中的所有第一个医学实体的集合,L
p、L
q和L
r分别是医学关系组中的三个医学关系p、q和r对应的三元组中的所有第二个医学实体的集合。
可选地,所述推理规则发现器,用于计算所述医学关系组中的两个医学关系的关系矩阵的乘积与另一个医学关系的关系矩阵L2范数,将所述L2范数作为相似度。
本公开的实施例还提供一种医学知识库的数据检索方法,包括:获取每一个医学关系的关系矩阵,其中,所述关系矩阵反映具有所述医学关系的两个医学实体的向量之间的关系,所述医学知识库包括多个医学知识,每一个所述医学知识为两个医学实体以及所述两个医学实体的医学关系的组合;根据多个所述医学关系的关系矩阵,获取推理规则,其中,获取的每一个推理规则包括第一医学关系、第二医学关系和第三医学关系,所述推理规则用于表示由所述第一医学关系和所述第二医学关系能够推理出所述第三医学关系;输入检索词或检索式进行检索;根据所述检索词或检索式基于所述推理 规则进行检索获得与该检索词或检索式相关的医学知识;以及输出与该检索词或检索式相关的医学知识。该方法的具体流程示意图如图6所示。
本公开的实施例还提供一种医学知识数据库,包括:数据输入器,用于输入医学知识数据;关系矩阵学习器,用于获取每一个所述医学关系的关系矩阵,其中,所述关系矩阵反映具有所述医学关系的医学实体的向量之间的关系,所述医学知识库包括多个医学知识,每一个所述医学知识为两个医学实体以及所述两个医学实体的医学关系的组合;推理规则发现器,用于根据多个所述医学关系的关系矩阵,获取推理规则,其中,获取的每一个推理规则中包括第一医学关系、第二医学关系和第三医学关系,所述推理规则用于表示由所述第一医学关系和所述第二医学关系能够推理出所述第三医学关系;输入器,用于输入检索词或检索式;检索器,用于根据所述检索词或检索式基于所述推理规则进行检索以获得与该检索词或检索式相关的医学知识;以及输出器,用于输出与该检索词或检索式相关的医学知识。该医学知识数据库的结构框图如图5所示。
在本公开的实施例中,所述数据输入器可以为例如网络输入、USB存储设备、光盘或其他存储设备;所述输入器可以为例如键盘、鼠标、摄像头、扫描仪、光笔、语音输入装置、手写输入板和触摸屏等;所述检索器可以为例如医学技术领域中常见的检索器;所述输出器可以为例如显示器、打印机、绘图仪、影像输出系统、语音输出系统、磁记录设备等。
本公开实施例中,通过将医学关系表示成矩阵,然后通过矩阵之间的运算,从海量的医学知识中自动发现推理规则,不需要人工定义推理规则,节省了大量的人力物力,降低了时间成本;基于上述推理规则建立的医学知识数据库以及检索方法,使得检索人能够检索出更加完整的医学知识数据。
以上所述是本公开的可选实施方式。应当指出,对于本技术领域的普通技术人员来说,在不脱离本公开所述原理的前提下,还可以作出若干改进和润饰,这些改进和润饰也应视为本公开的保护范围。
Claims (27)
- 一种推理规则自动发现方法,其应用于医学知识库,所述医学知识库包括多个医学知识,每一个所述医学知识为两个医学实体以及所述两个医学实体的医学关系的组合,其特征在于,所述方法包括:获取每一个所述医学关系的关系矩阵,所述关系矩阵反映具有所述医学关系的两个医学实体的向量之间的关系;根据多个所述医学关系的关系矩阵,获取推理规则,其中,获取的每一个推理规则包括第一医学关系、第二医学关系和第三医学关系,所述推理规则用于表示由所述第一医学关系和所述第二医学关系能够推理出所述第三医学关系。
- 根据权利要求1所述的方法,其特征在于,所述获取每一个所述医学关系的关系矩阵的步骤包括:针对每一个医学关系,构建一个初始的关系矩阵;获取与所述医学知识库中的所述医学关系对应的医学知识作为正确医学知识;更换所述正确医学知识中的医学实体以得到新的医学知识作为错误医学知识;采用打分函数为所述正确医学知识和错误医学知识分别打分;采用预设目标函数,对所述初始的关系矩阵进行学习,使得所述正确医学知识的打分高于所述错误医学知识的打分,得到所述医学关系的关系矩阵。
- 根据权利要求2所述的方法,其特征在于,对所述初始的关系矩阵进行学习时,采用的所述正确医学知识和错误医学知识的个数差小于预设阈值。
- 根据权利要求2所述的方法,其特征在于,所述更换所述正确医学知识中的医学实体以得到新的医学知识作为错误医学知识的步骤包括:采用除了包含在与所述医学关系对应的所有正确医学知识中的医学实体之外的医学实体来更换所述正确医学知识中的医学实体,得到新的医学知识作为错误医学知识。
- 根据权利要求1所述的方法,其特征在于,所述根据所述医学关系的关系矩阵,获取推理规则的步骤包括:从所有医学关系中,选择多个医学关系组,构建多个待验证推理规则,其中,每一个待验证推理规则对应一个医学关系组,每一个医学关系组包括三个医学关系;计算所述医学关系组中的两个医学关系的关系矩阵的乘积与另一个医学关系的关系矩阵的相似度;以及根据所述相似度,选择部分待验证推理规则作为最终得到的推理规则。
- 根据权利要求8所述的方法,其特征在于,所述根据所述相似度,选 择部分待验证推理规则作为最终得到的推理规则的步骤包括:将所有所述医学关系组对应的相似度进行排序,得到相似度最大的预设数目个待验证推理规则;以及将所述预设数目个待验证推理规则作为最终得到的推理规则。
- 根据权利要求8或9所述的方法,其特征在于,所述计算所述医学关系组中的两个医学关系的关系矩阵的乘积与另一个医学关系的关系矩阵的相似度的步骤包括:计算所述医学关系组中的两个医学关系的关系矩阵的乘积与另一个医学关系的关系矩阵L2范数,将所述L2范数作为相似度。
- 一种推理规则自动发现系统,应用于医学知识库,所述医学知识库中包括多个医学知识,每一个所述医学知识为两个医学实体和一个医学关系的组合,其特征在于,所述系统包括:关系矩阵学习器,用于获取每一个所述医学关系的关系矩阵,所述关系矩阵反映具有所述医学关系的医学实体的向量之间的关系;以及推理规则发现器,用于根据多个所述医学关系的关系矩阵,获取推理规则,其中,获取的每一个推理规则中包括第一医学关系、第二医学关系和第三医学关系,所述推理规则用于表示由所述第一医学关系和所述第二医学关系能够推理出所述第三医学关系。
- 根据权利要求12所述的系统,其特征在于,所述关系矩阵学习器, 用于执行如下操作:针对每一个医学关系,构建一个初始的关系矩阵;获取与所述医学知识库中的所述医学关系对应的医学知识作为正确医学知识;更换所述正确医学知识中的医学实体以得到新的医学知识作为错误医学知识;采用打分函数为所述正确医学知识和错误医学知识分别打分;以及采用预设目标函数,对所述初始的关系矩阵进行学习,使得所述正确医学知识的打分高于所述错误医学知识的打分,得到所述医学关系的关系矩阵。
- 根据权利要求13所述的系统,其特征在于,对所述初始的关系矩阵进行学习时,采用的所述正确医学知识和错误医学知识的个数差小于预设阈值。
- 根据权利要求13所述的系统,其特征在于,所述关系矩阵学习器用于执行如下操作:采用除了包含在与所述医学关系对应的所有正确医学知识的医学实体之外的医学实体,更换所述正确医学知识中的医学实体,得到新的医学知识作为错误医学知识。
- 根据权利要求12所述的系统,其特征在于,所述推理规则发现器,用于执行如下操作:从所有医学关系中,选择多个医学关系组,构建多个待验证推理规则,其中,每一个待验证推理规则对应一个医学关系组,每一个医学关系组包括三个医学关系;计算所述医学关系组中的两个医学关系的关系矩阵的乘积与另一个医学关系的关系矩阵的相似度;以及根据所述相似度,选择部分待验证推理规则作为最终得到的推理规则。
- 根据权利要求19所述的系统,其特征在于,所述推理规则发现器,用于执行如下操作:将所有所述医学关系组对应的相似度进行排序,得到相似度最大的预设数目个待验证推理规则;以及将所述预设数目个待验证推理规则作为最终得到的推理规则。
- 根据权利要求19或20所述的系统,其特征在于,所述推理规则发现器,用于执行如下操作:计算所述医学关系组中的两个医学关系的关系矩阵的乘积与另一个医学关系的关系矩阵L2范数,将所述L2范数作为相似度。
- 一种医学知识库的数据检索方法,包括:获取每一个医学关系的关 系矩阵,其中,所述关系矩阵反映具有所述医学关系的两个医学实体的向量之间的关系,所述医学知识库包括多个医学知识,每一个所述医学知识为两个医学实体以及所述两个医学实体的医学关系的组合;根据多个所述医学关系的关系矩阵,获取推理规则,其中,获取的每一个推理规则包括第一医学关系、第二医学关系和第三医学关系,所述推理规则用于表示由所述第一医学关系和所述第二医学关系能够推理出所述第三医学关系;输入检索词或检索式进行检索;根据所述检索词或检索式基于所述推理规则进行检索获得与该检索词或检索式相关的医学知识;以及输出与该检索词或检索式相关的医学知识。
- 根据权利要求23所述的检索方法,其特征在于,所述输出与该检索词或检索式相关的医学知识包括显示与该检索词或检索式相关的医学知识。
- 一种医学知识数据库,包括:数据输入器,用于输入医学知识数据;关系矩阵学习器,用于获取每一个所述医学关系的关系矩阵,其中,所述关系矩阵反映具有所述医学关系的医学实体的向量之间的关系,所述医学知识库包括多个医学知识,每一个所述医学知识为两个医学实体以及所述两个医学实体的医学关系的组合;推理规则发现器,用于根据多个所述医学关系的关系矩阵,获取推理规则,其中,获取的每一个推理规则中包括第一医学关系、第二医学关系和第三医学关系,所述推理规则用于表示由所述第一医学关系和所述第二医学关系能够推理出所述第三医学关系;检索词或检索式输入器,用于输入检索词或检索式;检索器,用于根据所述检索词或检索式基于所述推理规则进行检索以获得与该检索词或检索式相关的医学知识;以及输出器,用于输出与该检索词或检索式相关的医学知识。
- 根据权利要求25所述的数据库,其特征在于,所述输出器用于显示与该检索词或检索式相关的医学知识。
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US16/317,686 US11216475B2 (en) | 2017-06-28 | 2018-01-17 | Method and system for automatically discovering inference rule, database and retrieval method |
| JP2019569383A JP7151730B2 (ja) | 2017-06-28 | 2018-01-17 | 推論規則自動発見方法およびシステム、データベースおよび検索方法 |
| EP18824315.8A EP3471104A4 (en) | 2017-06-28 | 2018-01-17 | AUTOMATIC INFERENCE RULE DISCOVERY SYSTEM AND METHOD, DATABASE AND SEARCH METHOD |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201710507636.9 | 2017-06-28 | ||
| CN201710507636.9A CN109147953A (zh) | 2017-06-28 | 2017-06-28 | 一种推理规则自动发现方法及系统 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2019000920A1 true WO2019000920A1 (zh) | 2019-01-03 |
Family
ID=64742706
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2018/073004 Ceased WO2019000920A1 (zh) | 2017-06-28 | 2018-01-17 | 一种推理规则自动发现方法及系统、数据库及检索方法 |
Country Status (5)
| Country | Link |
|---|---|
| US (1) | US11216475B2 (zh) |
| EP (1) | EP3471104A4 (zh) |
| JP (1) | JP7151730B2 (zh) |
| CN (1) | CN109147953A (zh) |
| WO (1) | WO2019000920A1 (zh) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114238522A (zh) * | 2021-09-30 | 2022-03-25 | 武汉众智数字技术有限公司 | 一种基于知识图谱的家族关系推理方法及系统 |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11586520B2 (en) * | 2020-07-09 | 2023-02-21 | International Business Machines Corporation | Automated data linkages across datasets |
| CN120012947B (zh) * | 2025-04-18 | 2025-07-01 | 中国科学院空天信息创新研究院 | 一种面向属性融合的知识图谱多跳推理方法及装置 |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103824115A (zh) * | 2014-02-28 | 2014-05-28 | 中国科学院计算技术研究所 | 面向开放网络知识库的实体间关系推断方法及系统 |
| US20140229161A1 (en) * | 2013-02-12 | 2014-08-14 | International Business Machines Corporation | Latent semantic analysis for application in a question answer system |
| CN104239385A (zh) * | 2013-06-11 | 2014-12-24 | 国际商业机器公司 | 用于推断主题之间的关系的方法和系统 |
| CN105718726A (zh) * | 2016-01-18 | 2016-06-29 | 沈阳工业大学 | 基于粗糙集的医疗辅助检查系统知识获取和推理方法 |
| CN106528609A (zh) * | 2016-09-28 | 2017-03-22 | 厦门理工学院 | 一种向量约束嵌入转换的知识图谱推理方法 |
| CN106874695A (zh) * | 2017-03-22 | 2017-06-20 | 北京大数医达科技有限公司 | 医疗知识图谱的构建方法和装置 |
Family Cites Families (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7899764B2 (en) * | 2007-02-16 | 2011-03-01 | Siemens Aktiengesellschaft | Medical ontologies for machine learning and decision support |
| EP2612293A4 (en) * | 2010-09-01 | 2016-05-04 | Apixio Inc | SYSTEM WITH A MEDICAL INDICATOR NAVIGATION MACHINE (MINE) |
| US8639678B2 (en) | 2011-09-12 | 2014-01-28 | Siemens Corporation | System for generating a medical knowledge base |
| EP2775412A1 (en) * | 2013-03-07 | 2014-09-10 | Medesso GmbH | Method of generating a medical suggestion as a support in medical decision making |
| US10019531B2 (en) | 2013-05-19 | 2018-07-10 | Carmel Kent | System and method for displaying, connecting and analyzing data in an online collaborative webpage |
| WO2015009682A1 (en) | 2013-07-15 | 2015-01-22 | De, Piali | Systems and methods for semantic reasoning |
| WO2017100356A1 (en) * | 2015-12-07 | 2017-06-15 | Data4Cure, Inc. | A method and system for ontology-based dynamic learning and knowledge integration from measurement data and text |
| CN107239722B (zh) * | 2016-03-25 | 2021-11-12 | 佳能株式会社 | 用于从医疗文档中提取诊断对象的方法和装置 |
| US10331659B2 (en) * | 2016-09-06 | 2019-06-25 | International Business Machines Corporation | Automatic detection and cleansing of erroneous concepts in an aggregated knowledge base |
| CN106528610A (zh) * | 2016-09-28 | 2017-03-22 | 厦门理工学院 | 一种基于路径张量分解的知识图谱表示学习方法 |
-
2017
- 2017-06-28 CN CN201710507636.9A patent/CN109147953A/zh active Pending
-
2018
- 2018-01-17 US US16/317,686 patent/US11216475B2/en active Active
- 2018-01-17 JP JP2019569383A patent/JP7151730B2/ja active Active
- 2018-01-17 WO PCT/CN2018/073004 patent/WO2019000920A1/zh not_active Ceased
- 2018-01-17 EP EP18824315.8A patent/EP3471104A4/en not_active Ceased
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20140229161A1 (en) * | 2013-02-12 | 2014-08-14 | International Business Machines Corporation | Latent semantic analysis for application in a question answer system |
| CN104239385A (zh) * | 2013-06-11 | 2014-12-24 | 国际商业机器公司 | 用于推断主题之间的关系的方法和系统 |
| CN103824115A (zh) * | 2014-02-28 | 2014-05-28 | 中国科学院计算技术研究所 | 面向开放网络知识库的实体间关系推断方法及系统 |
| CN105718726A (zh) * | 2016-01-18 | 2016-06-29 | 沈阳工业大学 | 基于粗糙集的医疗辅助检查系统知识获取和推理方法 |
| CN106528609A (zh) * | 2016-09-28 | 2017-03-22 | 厦门理工学院 | 一种向量约束嵌入转换的知识图谱推理方法 |
| CN106874695A (zh) * | 2017-03-22 | 2017-06-20 | 北京大数医达科技有限公司 | 医疗知识图谱的构建方法和装置 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP3471104A4 * |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114238522A (zh) * | 2021-09-30 | 2022-03-25 | 武汉众智数字技术有限公司 | 一种基于知识图谱的家族关系推理方法及系统 |
Also Published As
| Publication number | Publication date |
|---|---|
| JP2020525896A (ja) | 2020-08-27 |
| EP3471104A4 (en) | 2020-04-08 |
| JP7151730B2 (ja) | 2022-10-12 |
| CN109147953A (zh) | 2019-01-04 |
| EP3471104A1 (en) | 2019-04-17 |
| US11216475B2 (en) | 2022-01-04 |
| US20190171642A1 (en) | 2019-06-06 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN108804677B (zh) | 结合多层级注意力机制的深度学习问题分类方法及系统 | |
| Liu et al. | Ordinal constraint binary coding for approximate nearest neighbor search | |
| US9147167B2 (en) | Similarity analysis with tri-point data arbitration | |
| US10163034B2 (en) | Tripoint arbitration for entity classification | |
| CN110399392B (zh) | 语义关系数据库运算 | |
| Athitsos et al. | Boostmap: An embedding method for efficient nearest neighbor retrieval | |
| CN110019474B (zh) | 异构数据库中的同义数据自动关联方法、装置及电子设备 | |
| US20140344195A1 (en) | System and method for machine learning and classifying data | |
| WO2022222942A1 (zh) | 问答记录生成方法、装置、电子设备及存储介质 | |
| WO2019218473A1 (zh) | 一种字段匹配方法、装置、终端设备及介质 | |
| EP4091063A1 (en) | Systems and methods for mapping a term to a vector representation in a semantic space | |
| CN110134777B (zh) | 问题去重方法、装置、电子设备和计算机可读存储介质 | |
| CN116861022B (zh) | 一种基于深度卷积神经网络和局部敏感哈希算法相结合的图像检索方法 | |
| WO2017113725A1 (zh) | 一种关联信息的获取与排序方法和系统 | |
| US20230195735A1 (en) | Method and apparatus for identifying similar data elements using string matching | |
| WO2019000920A1 (zh) | 一种推理规则自动发现方法及系统、数据库及检索方法 | |
| CN111651625A (zh) | 图像检索方法、装置、电子设备及存储介质 | |
| CN117435685A (zh) | 文档检索方法、装置、计算机设备、存储介质和产品 | |
| CN115344734A (zh) | 图像检索方法、装置、电子设备及计算机可读存储介质 | |
| CN119887681B (zh) | 基于视觉-语言提示的工业缺陷检测方法 | |
| CN114821140A (zh) | 基于曼哈顿距离的图像聚类方法、终端设备及存储介质 | |
| CN114861625A (zh) | 一种获得目标训练样本的方法、电子设备及介质 | |
| CN114398877A (zh) | 基于人工智能的主题提取方法、装置、电子设备及介质 | |
| CN113868424B (zh) | 文本主题的确定方法、装置、计算机设备及存储介质 | |
| WO2023050461A1 (zh) | 一种数据的聚类方法、系统及存储介质 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| ENP | Entry into the national phase |
Ref document number: 2018824315 Country of ref document: EP Effective date: 20190114 |
|
| ENP | Entry into the national phase |
Ref document number: 2019569383 Country of ref document: JP Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |

