WO2024032096A1 - 反应物分子的预测方法、训练方法、装置以及电子设备 - Google Patents
反应物分子的预测方法、训练方法、装置以及电子设备 Download PDFInfo
- Publication number
- WO2024032096A1 WO2024032096A1 PCT/CN2023/096605 CN2023096605W WO2024032096A1 WO 2024032096 A1 WO2024032096 A1 WO 2024032096A1 CN 2023096605 W CN2023096605 W CN 2023096605W WO 2024032096 A1 WO2024032096 A1 WO 2024032096A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- action
- editing
- atom
- completion
- atoms
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/10—Analysis or design of chemical reactions, syntheses or processes
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/30—Prediction of properties of chemical compounds, compositions or mixtures
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/50—Molecular design, e.g. of drugs
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/70—Machine learning, data mining or chemometrics
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/80—Data visualisation
Definitions
- Embodiments of the present application relate to the field of chemical reverse reactions, and more specifically, to prediction methods, training methods, devices and electronic equipment for reactant molecules.
- Retrosynthetic reactant prediction in organic chemistry is a key step in the development of new drugs and the manufacture of new materials.
- the purpose is to find a set of commercially available reactant molecules for the synthesis of product molecules.
- Traditional methods use reaction templates to match reactant molecules to obtain reactant molecules.
- the reaction templates need to be manually extracted by professional researchers. The process is very time-consuming and the reflection templates cannot cover all reaction types.
- the development of deep learning technology has made it possible to learn potential reaction types from large databases of organic chemical reactions. Therefore, it is particularly important to use deep learning technology to build a powerful reverse reaction prediction model.
- SMILES Simplified Molecular Input Line Entry System
- Graph generation models generally divide the organic chemical reverse reaction prediction task into two subtasks, namely synthon identification and synthon completion.
- synthon identification and synthon completion usually, you can identify the synthon by building a graph neural network to obtain the synthon of the product molecule, and complete the synthon atom by atom by building a graph variational autoencoder.
- synthon identification and synthon completion respectively, it not only increases the prediction complexity, but also cannot achieve good generalization performance because the two subtasks have different optimization goals.
- the complexity of completing synthons atom-by-atom limits the prediction performance.
- the embodiments of the present application provide a prediction method, training method, device and electronic equipment for reactant molecules, which can not only reduce the prediction complexity of reactant molecules and improve the generalization performance of reactant molecule prediction, but also improve the prediction of reactant molecules. prediction performance.
- this application provides a method for predicting reactant molecules, including:
- the reverse reaction prediction model uses the reverse reaction prediction model to predict the conversion path between the product molecule and multiple reactant molecules; the conversion path includes editing sequences and synthon completion sequences;
- the editing objects indicated by each editing action are edited to obtain multiple synthons corresponding to the product molecule, and the editing objects are atoms or atoms in the product molecule. chemical bond;
- the interface atom of the full action indication is added to the basic diagram of each synthon to complete the action indication, and multiple reactant molecules corresponding to the multiple synthons are obtained.
- the basic diagram includes multiple atoms or is used to connect the multiple atoms. atomic edge.
- this application provides a training method for a reverse reaction prediction model, including:
- the conversion path includes an editing sequence and a synthesizer completion sequence; each editing action in the editing sequence is used to indicate the editing object and the post-edited state, and the editing object is the atom or chemical bond in the product molecule; for utilizing the A plurality of synthons of the product molecule obtained by editing the sequence.
- the synthon completion sequence includes at least one synthon completion action corresponding to each of the multiple synthons. In the at least one synthon completion action, Each synthon completion action of is used to indicate a basic graph including a plurality of atoms or atomic edges connecting the plurality of atoms and interface atoms;
- the inverse reaction prediction model is trained.
- this application provides a device for predicting reactant molecules, including:
- An extraction unit is used to extract features of product molecules and obtain the characteristics of the product molecules
- a prediction unit is used to predict a conversion path from the product molecule to multiple reactant molecules based on the characteristics of the product molecule and using a reverse reaction prediction model; the conversion path includes an editing sequence and a synthon completion sequence;
- the editing unit is used to edit the editing object indicated by each editing action in the editing sequence according to the edited state indicated by each editing action, and obtain multiple synthons corresponding to the product molecule, and the editing object is the product.
- Atoms or chemical bonds in a molecule
- An adding unit configured for each of the plurality of synthesizers, based on at least one synthesizer completion action corresponding to the respective synthesizer in the synthesizer completion sequence, according to the at least one synthesizer completion action.
- the interface atoms of each synthon completion action instruction are added to the basic diagram of each synthon completion action instruction to obtain multiple reactant molecules corresponding to the multiple synthons.
- the basic diagram includes multiple atoms or is used for Atomic edges connecting the multiple atoms.
- this application provides a training device for a reverse reaction prediction model, including:
- An extraction unit is used to extract features of product molecules and obtain the characteristics of the product molecules
- a prediction unit is used to predict the conversion path between the product molecule and multiple reactant molecules based on the characteristics of the product molecule and using a reverse reaction prediction model;
- the conversion path includes an editing sequence and a synthesizer completion sequence; each editing action in the editing sequence is used to indicate the editing object and the post-edited state, and the editing object is the atom or chemical bond in the product molecule; for utilizing the A plurality of synthons of the product molecule obtained by editing the sequence.
- the synthon completion sequence includes at least one synthon completion action corresponding to each of the multiple synthons. In the at least one synthon completion action, Each synthon completion action of is used to indicate a basic graph including a plurality of atoms or atomic edges connecting the plurality of atoms and interface atoms;
- a training unit is used to train the reverse reaction prediction model based on the loss between the conversion path and the training path.
- this application provides an electronic device, including:
- a processor adapted to implement computer instructions
- a computer-readable storage medium stores computer instructions, and the computer instructions are suitable for the processor to load and execute the prediction method of reactant molecules involved in the above-mentioned first aspect or the reverse reaction prediction model involved in the above-mentioned second aspect. Training methods.
- embodiments of the present application provide a computer-readable storage medium.
- the computer-readable storage medium stores computer instructions.
- the computer instructions When the computer instructions are read and executed by a processor of a computer device, the computer device executes the above-mentioned step.
- the first aspect relates to the prediction method of reactant molecules or the above-mentioned second aspect involves the training method of the reverse reaction prediction model.
- embodiments of the present application provide a computer program product or computer program.
- the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
- the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the prediction method of the reactant molecules involved in the first aspect or the reverse reaction involved in the second aspect. Predictive model training methods.
- the prediction task of the synthon and the prediction task of the synthon completion can be merged Processing, that is, the reverse reaction prediction model introduced in the embodiment of the present application can learn the potential relationship between the two subtasks of synthon prediction and synthon completion, thereby greatly improving the generalization performance of the model and reducing the reaction rate of reactant molecules.
- Prediction complexity and improves the generalization performance of reactant molecule prediction in addition, by introducing a basic graph and designing the basic graph as a structure that includes multiple atoms or atomic edges used to connect the multiple atoms, it is possible to reasonably construct short And the accurate conversion path avoids the length of the synthon completion sequence being too long, reduces the difficulty of predicting reactant molecules and improves the prediction accuracy of reactant molecules, thereby improving the prediction performance of reactant molecules.
- Figure 1 is an example of the system framework provided by the embodiment of this application.
- Figure 2 is a schematic flow chart of the prediction method of reactant molecules provided by the embodiment of the present application.
- Figure 3 is an example of a conversion path provided by an embodiment of the present application.
- Figure 4 is another schematic flow chart of the prediction method of reactant molecules provided by the embodiment of the present application.
- Figure 5 is a schematic flow chart of the training method of the reverse reaction prediction model provided by the embodiment of the present application.
- Figure 6 is a schematic block diagram of a predicting device for reactant molecules provided by an embodiment of the present application.
- Figure 7 is a schematic block diagram of the training device of the reverse reaction prediction model provided by the embodiment of the present application.
- Figure 8 is a schematic block diagram of an electronic device provided by an embodiment of the present application.
- AI Artificial Intelligence
- the solution provided by this application may relate to the technical field of reverse reaction prediction based on AI in the field of chemistry.
- AI is the theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
- artificial intelligence is a comprehensive technology of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can respond in a similar way to human intelligence.
- Artificial intelligence is the study of the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
- artificial intelligence technology is a comprehensive subject that covers a wide range of fields, including both hardware-level technology and software-level technology.
- Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, mechatronics and other technologies.
- Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
- artificial intelligence technology has been researched and applied in many fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, driverless driving, autonomous driving, and drones. , robots, smart medical care, smart customer service, etc. I believe that with the development of technology, artificial intelligence technology will be applied in more fields and play an increasingly important role.
- Embodiments of the present application may relate to computer vision (Computer Vision, CV) technology in artificial intelligence technology.
- Computer Vision Computer Vision, CV
- the embodiments of the present application relate to the technical field of reverse reaction prediction based on the results of CV identification in the field of chemistry.
- Computer vision is a science that studies how to make machines "see”. Furthermore, it refers to the use of cameras to Projectors and computers replace human eyes in machine vision such as target recognition, prediction and measurement, and further perform graphics processing, making computer processing into images more suitable for human eyes to observe or transmit to instruments for detection.
- Computer vision studies related theories and technologies trying to build artificial intelligence systems that can obtain information from images or multi-dimensional data.
- Computer vision technology usually includes image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, simultaneous positioning and mapping Construction and other technologies also include common biometric identification technologies such as face recognition and fingerprint recognition.
- Embodiments of this application may also relate to machine learning (Machine Learning, ML) in artificial intelligence.
- Machine Learning Machine Learning
- embodiments of the present application may relate to the technical field of reverse reaction prediction using machine learning prediction models.
- ML is a multi-field interdisciplinary subject involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other disciplines. It specializes in studying how computers can simulate or implement human learning behavior to acquire new knowledge or skills, and reorganize existing knowledge structures to continuously improve their performance.
- Machine learning is the core of artificial intelligence and the fundamental way to make computers intelligent. Its applications cover all fields of artificial intelligence.
- Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, teaching learning and other technologies.
- Figure 1 is an example of a system framework 100 provided by an embodiment of the present application.
- the system framework 100 may be an application system, and the embodiment of the present application does not limit the specific type of the application program.
- the system framework 100 includes: a terminal 131, a terminal 132 and a server cluster 110. Both terminal 131 and terminal 132 can be connected to the server cluster 110 through a wireless or wired network 120.
- the terminal 131 and the terminal 132 may be at least one of a smart phone, a game console, a desktop computer, a tablet computer, an e-book reader, an MP4 player, an MP4 player, and a laptop computer.
- the terminal 131 and the terminal 132 have application programs installed and run.
- the application can be an online video program, a short video program, a picture sharing program, a sound social program, an animation program, a wallpaper program, a news push program, a supply and demand information push program, an academic exchange program, a technical exchange program, a policy exchange program, including comments Mechanism-based programs, programs containing opinion publishing mechanisms, and knowledge-sharing programs.
- Terminal 131 and terminal 132 may be terminals used by user 141 and user 142 respectively, and user accounts are logged into the applications running in terminal 131 and terminal 132.
- the server cluster 110 includes at least one of one server, multiple servers, a cloud computing platform, and a virtualization center.
- the server cluster 110 is used to provide background services for applications (eg, applications on terminals 131 and 132).
- applications eg, applications on terminals 131 and 132
- the server cluster 110 takes on the main computing work, and the terminal 131 and the terminal 132 take on the secondary computing work; or the server cluster 110 takes on the secondary computing work, and the terminals 131 and 132 take on the main computing work; or the terminal 131 and the terminal
- a distributed computing architecture is used between 132 and the server cluster 110 for collaborative computing.
- the calculation work involved in this application may be calculation work related to the prediction of organic chemistry retrosynthetic reactants or related auxiliary work.
- the server cluster 110 includes: an access server 112, a web server 111 and a data server 113.
- the access servers 112 can be deployed in different cities nearby.
- the access servers 112 are used to receive service requests from terminals 131 and 132, and forward the service requests to the corresponding servers. deal with.
- the web server 111 is a server used to provide web pages to the terminals 131 and 132, and embedded code is integrated into the web pages;
- the data server 113 is used to receive data (such as business data, etc.) reported by the terminals 131 and 132.
- Retrosynthetic reactant prediction in organic chemistry is a key step in the development of new drugs and the manufacture of new materials.
- the purpose is to find a set of commercially available reactant molecules for the synthesis of product molecules.
- Traditional methods use reaction templates to match reactant molecules to obtain reactant molecules.
- the reaction templates need to be manually extracted by professional researchers. The process is very time-consuming and the reflection templates cannot cover all reaction types.
- the development of deep learning technology has made it possible to learn potential reaction types from large databases of organic chemical reactions. Therefore, it is particularly important to use deep learning technology to build a powerful reverse reaction prediction model.
- SMILES Simplified Molecular Input Line Entry System
- Graph generation models generally divide the organic chemical reverse reaction prediction task into two subtasks, namely synthon identification and synthon completion.
- synthon identification and synthon completion usually, you can identify the synthon by building a graph neural network to obtain the synthon of the product molecule, and complete the synthon atom by atom by building a graph variational autoencoder.
- synthon identification and synthon completion respectively, it not only increases the prediction complexity, but also cannot achieve good generalization performance because the two subtasks have different optimization goals.
- the complexity of completing synthons atom-by-atom limits the prediction performance.
- a leaving group is an atom or functional group that is separated from a larger molecule during a chemical reaction.
- the functional group is an atom or atomic group that determines the chemical properties of an organic compound. Common functional groups include carbon-carbon double bonds, carbon-carbon triple bonds, hydroxyl groups, carboxyl groups, ether bonds, aldehyde groups, etc.
- two subtasks can be jointly learned through an end-to-end model, that is, the two subtasks of synthon identification and synthon completion can be integrated into an end-to-end model, and the synthon can be completed atom-by-atom or benzene ring.
- the end-to-end model refers to a model that can be trained and optimized through a single optimization goal.
- top-1 category accuracy refers to the top-1 category accuracy and actual results. consistent accuracy.
- embodiments of the present application provide a prediction method, training method, device and electronic equipment for reactant molecules, which can not only reduce the prediction complexity of reactant molecules and improve the generalization performance of reactant molecule prediction, but also Improved prediction performance of reactant molecules.
- the embodiments of this application design an end-to-end reverse reaction prediction model to jointly optimize the two subtasks of synthon prediction and synthon completion.
- the synthon prediction task can be constructed as: predicting the ability to characterize the product molecule Edit sequence of the conversion process to a synthon
- the completion synthon prediction task is constructed as: predicting a synthon completion sequence that can characterize the conversion process of a synthon to a reactant molecule, and constructed by the edit sequence and the synthon completion sequence Develop a conversion path that can characterize the conversion process from product molecules to reactant molecules.
- the solutions provided by the embodiments of this application can be easily extended to more complex and diverse chemical reaction models.
- the reverse reaction prediction model provided by the embodiments of the present application can be extended to a multi-step reverse reaction prediction task.
- a Monte Carlo tree search algorithm can be used to extend the reverse reaction prediction model provided by the embodiments of the present application to a multi-step reverse reaction task. prediction task.
- the embodiments of the present application can also greatly avoid the problem of limited prediction of reactant molecules caused by the extremely unbalanced sample of leaving groups by introducing basic graphs, and solve the problems when using atoms to complete synthons.
- the problem of excessive prediction complexity can be improved, thereby improving the prediction accuracy of the reverse reaction prediction model.
- FIG. 2 shows a schematic flow chart of a prediction method 200 for reactant molecules according to an embodiment of the present application.
- the prediction method 200 can be executed by any electronic device with data processing capabilities.
- the electronic device may be implemented as a server.
- the server can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers. It can also provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, and middleware. Cloud servers for basic cloud computing services such as software services, domain name services, security services, and big data and artificial intelligence platforms.
- the servers can be connected directly or indirectly through wired or wireless communication methods. This application is not limited here. For the convenience of description, the prediction method provided by this application is explained below by taking a prediction device for reactant molecules as an example.
- the prediction method 200 may include:
- S210 Perform feature extraction on the product molecule to obtain the characteristics of the product molecule.
- features of the product molecule can be extracted through Simplified Molecular Input Line Entry System (SMILES) or Graph Neural Networks (GNN). Take to obtain the characteristics of the product molecule.
- SILES Simplified Molecular Input Line Entry System
- NNN Graph Neural Networks
- feature extraction can also be performed on other models or framework product molecules with type functions, which are not specifically limited in the embodiments of the present application.
- the characteristics of the product molecule may be determined based on the characteristics of individual atoms in the product molecule.
- a global attention pooling function can be used to calculate the characteristics of all atoms to obtain the characteristics of the product molecule.
- the original characteristics of each atom and the original characteristics of the chemical bond between each atom and the neighbor node of each atom can be encoded to obtain the characteristics of each atom.
- MPNN can be used to encode the original characteristics of each atom and the original characteristics of the chemical bond between each atom and the neighbor node of each atom to obtain the characteristics of each atom.
- the original characteristics of each atom are used to characterize at least one of the following information: the type of atom (such as C, N, O, S, etc.), the degree of chemical bonding, chirality, the number of hydrogen atoms, and other information.
- the original characteristics of the chemical bond are used to characterize at least one of the following information: chemical bond type (such as single bond, double bond, triple bond, aromatic bond, etc.), configuration, aromaticity and other information.
- the characteristics of the product molecule may also include characteristics representing the types of chemical bonds in the product molecule.
- the types of chemical bonds in the product molecule include, but are not limited to: single bonds, double bonds, triple bonds and no chemical bonds.
- the characteristics of the product molecule can be embodied in the form of a vector or a matrix. Of course, they can also be replaced by information in an array or other formats, which is not specifically limited in this application.
- the characteristics of the product molecule are illustratively described below by taking the eigenvector as an example.
- Each atom v ⁇ V has a characteristic vector x v , which represents information such as the type of atom (such as C, N, O, S, etc.), the degree of chemical bond, chirality, and the number of hydrogen atoms.
- each chemical bond e ⁇ E has a feature vector x v,u , which contains information such as chemical bond type (such as single bond, double bond, triple bond, aromatic bond, etc.), configuration and aromaticity.
- a 4-dimensional one-hot vector can be defined to represent the type of chemical bonds in the molecular graph, namely single bond, double bond, triple bond and no chemical bond.
- a one-hot vector is a vector with only one element being 1 and the remaining elements being 0. All atoms and chemical bonds have a label s ⁇ 0,1 ⁇ used to indicate whether they are editing objects involved in converting product molecules into synthons.
- MPNN L-layer Message Passing Neural Network
- MLP Multilayer Perceptron
- the eigenvector h v of the atom and the eigenvector h vu of the chemical bond can be calculated by the following formula:
- MPNN( ⁇ ) represents the message passing neural network
- G represents the graph structure of the product molecule
- L represents the number of layers of MPNN( ⁇ ).
- x v represents the feature vector before encoding atom v
- x v,u represents the feature vector before encoding the chemical bond between atom v and atom u
- MLP bond ( ⁇ ) represents the multi-layer perceptron
- represents the splicing operation
- h v,u can be expressed in the form of a self-loop, that is:
- i is the label of an atom or the label of a chemical bond.
- the label of an atom may be an index of an atom
- the label of a chemical bond may be an index of a chemical bond
- a global attention pooling function can be used to extract features from all atoms.
- Vector calculation obtains the characteristic vector h G of the product molecule. It is worth noting that it is possible to use the eigenvector of the synthon, which is the same as h G. You can also use a global attention pooling function to calculate the eigenvector of the synthon including all atoms. Characteristic vector h syn .
- contrastive learning strategies can also be introduced, such as masking the graph structure or features of the molecular graph, that is, the reactants can be improved by expanding the feature dimension. Predictive performance of molecules.
- S220 based on the characteristics of the product molecule, use the reverse reaction prediction model to predict the conversion path from the product molecule to multiple reactant molecules; the conversion path includes editing sequences and synthon completion sequences.
- the editing sequence is a sequence formed by an editing action
- the synthesizer completion sequence is a sequence formed by a synthesizer completion action.
- the prediction task of the inverse reaction prediction model is defined as a reactant molecule generation task. That is, the prediction task of the reverse reaction prediction model is: predict a conversion path that describes the conversion path from the product molecular graph to the reactant molecular graph.
- the conversion path is defined by editing actions for product molecules and synthon completion actions for synthons. That is, the conversion path is constructed through the (Edit) sequence formed by the editing action and the synthon completion sequence formed by the synthon completion action. Therefore, a description can be predefined for each product molecule from the product molecule graph to the reactant molecule.
- the synthesizer completion action can also be called a basic graph adding (AddingMotif) action, that is, the synthesizer completion sequence can also be called a basic graph adding sequence.
- the transformation path includes editing sequences and synthesizer completion sequences.
- the editing sequence is used to describe the chemical bond and atomic changes from the product molecule to the synthon.
- Synth completion sequence is used to describe the process of synth completion using a basic pattern (motif).
- editing actions are introduced to represent each change from product to synthon.
- synthon completion actions are introduced to describe the process of completing the synthon using a basic pattern (motif). Every completion operation.
- the editing action is an action of editing atoms in the product molecule or editing chemical bonds in the process of converting the product molecule into multiple synthons of the product molecule.
- the synth completion action is an action of adding a basic graph in the process of adding the multiple synthons to the multiple reactant molecules.
- the editing sequence may also include an editing completion action.
- the conversion path includes at least one editing action, an editing completion action, and at least one synthesizer completion action in sequence.
- the editing completion action is used to connect or distinguish the at least one editing action and the at least one synthesizer completion action.
- the edit completion action is used to trigger the reverse reaction prediction model to initiate the synthet completion task.
- the editing completion action is innovatively introduced to connect at least one editing action to at least one synthesizer completion action, thereby constructing a conversion path.
- the editing sequence may also include a start action.
- the conversion path includes the start action, at least one editing action, an editing completion action, and at least one synthesizer completion sequence in sequence.
- the start action is used to trigger the reverse reaction prediction model to initiate a synthesis sub-prediction task or to trigger the reverse reaction prediction model to initiate a reactant prediction task.
- the editing completion action can also be used as an action in the synthesizer completion sequence, which is not specifically limited in this application.
- reverse reaction prediction model involved in this application can be any deep learning or machine learning model used for identification, and the embodiments of this application do not limit its specific type.
- the inverse reaction prediction model includes but is not limited to: traditional learning model, integrated learning model or deep learning model.
- traditional learning models include but are not limited to: tree models (regression trees) or logistic regression (LR) models; integrated learning models include but are not limited to: improved models of gradient boosting algorithms (XGBoost) or random forest models ;
- Deep learning models include but are not limited to: neural network, dynamic Bayesian Network (DBN) and stacked auto-encoder network (SAE) models.
- DBN dynamic Bayesian Network
- SAE stacked auto-encoder network
- a batch size can be used to train the model.
- the batch size is the number of samples selected for one training.
- the value of the batch size determines the time required to complete each epoch in the deep learning training process and the smoothness of the gradient between each iteration.
- N For a training set of size N, if the batch size sampling method in each period adopts the most conventional N samples each sampled once, and the batch size is b, then the number of iterations required in each period is N/ b, therefore, the time required to complete each epoch roughly also increases with the number of iterations.
- the embodiment of the present application does not specifically limit the value of the batch size.
- the appropriate batch size can be determined according to actual needs or scenarios.
- the batch size can be used for network training.
- Neural network is a computing model composed of multiple neuron nodes connected to each other. The connection between nodes represents the weighted value from the input signal to the output signal, which is called weight; each node performs a weighted sum of different input signals. (summation, SUM), and output through a specific activation function (f).
- neural networks include but are not limited to: Deep Neural Network (DNN), Convolutional Neural Networks (CNN), Recurrent Neural Network (RNN), etc.
- DNN Deep Neural Network
- CNN Convolutional Neural Networks
- RNN Recurrent Neural Network
- CNN Convolutional Neural Networks
- feedforward neural network It is a type of feedforward neural network (Feedforward Neural Networks) that includes convolutional calculations and has a deep structure. It is also one of the representative algorithms of deep learning. Because convolutional neural networks can perform shift-invariant classification, they are also called shift-invariant artificial neural networks (SIANN).
- SIANN shift-invariant artificial neural networks
- RNN Recurrent Neural Network
- RNNs include structures such as Long Short-Term Memory (LSTM) and gated recurrent unit (GRU).
- LSTM Long Short-Term Memory
- GRU gated recurrent unit
- S230 Edit the editing objects indicated by each editing action in the editing sequence according to the edited status of each editing action instruction, and obtain multiple synthons (Synthons) corresponding to the product molecule.
- the editing object is the product. Atoms or chemical bonds in a molecule.
- the editing objects indicated by each editing action are edited to obtain multiple corresponding to the product molecule. Synthon.
- the edited state indicated by the editing action is to change the number of charges on the atom or change the number of hydrogen atoms on the atom; if the editing action indicates If the indicated editing object is a chemical bond, then the edited state indicated by the editing action is any of the following: adding a chemical bond, deleting a chemical bond, or changing the type of a chemical bond.
- the edited state indicated by the editing action is related to the editing object indicated by the editing action.
- the edited state indicated by the editing action includes but is not limited to the following states: adding a chemical bond, deleting a chemical bond, or changing the type of a chemical bond.
- the real-time edited state of the editing action includes but is not limited to the following states: changing the number of charges on the atom or changing the number of hydrogen atoms.
- the editing action is characterized by the following tags: an action tag used to indicate editing, a tag used to indicate the editing object, and a tag used to indicate the status after editing.
- the editing action in the editing sequence can be defined as an editing triplet, that is, ( ⁇ 1, o, ⁇ ), ⁇ 1 indicates that the action predicted by the reverse reaction prediction model is the editing action, and o indicates the editing action predicted by the reverse reaction prediction model.
- the label of the corresponding editing object, ⁇ represents the label of the edited state corresponding to the editing action predicted by the reverse reaction prediction model.
- a certain editing triplet is ( ⁇ 1, b, none)
- ⁇ 1 indicates that the action predicted by the reverse reaction prediction model is an editing action
- b indicates that the editing object corresponding to the editing action is the chemical bond labeled b
- none ( none) indicates that the edited state corresponding to the editing action is to delete the chemical bond labeled b.
- the editing action may be defined as a tuple or a value in other formats.
- the editing action can be defined as a tuple, it can be specifically defined as a label used to indicate the editing object and a label used to indicate the status after editing.
- the interface atom of the sub-completion action indication is added to the basic graph (Motif) of each synthon completion action indication to obtain multiple reactant molecules corresponding to the multiple synthons.
- the basic graph includes multiple atoms or is used for Atomic edges connecting the multiple atoms.
- the atomic edge may represent an interaction force between two or more atoms connected by the atomic edge. This force is used to bind the atoms to which this atom's edges are connected.
- the atomic edges are chemical bonds used to connect different atoms of the plurality of atoms.
- the atomic edge may be a chemical bond, including but not limited to: ionic bond, covalent bond, and metallic bond.
- a synthon is a molecular fragment obtained by breaking chemical bonds in the product molecular graph.
- the basic graph is a subgraph of the reactant, for example, it can be a subgraph on the reactant corresponding to the synthon.
- the reactants corresponding to the synthon may include molecules or reactants containing the synthon.
- a basic graph can include subgraphs obtained by:
- the ring involved in the embodiments of the present application may be a single ring, that is, there is only one ring in the molecule.
- the rings involved in the embodiments of the present application may be cycloalkanes, and cycloalkanes may be classified based on the number of carbon atoms on the ring.
- the number of carbon atoms in the ring is 3 to 4, it is called a small ring, when the number of carbon atoms in the ring is 5 to 6, it is called an ordinary ring, when the number of carbon atoms in the ring is 7 to 12, it is called a medium ring, and when the number of carbon atoms in the ring is 7 to 12, it is called a medium ring.
- the number of carbon atoms is greater than 12, it is called a macrocycle.
- the reverse reaction prediction model predicts the completion sequence of the synthon based on the first order of the plurality of synthons.
- the traversal order is the second order.
- at least one synthesizer completion action corresponding to each synthesizer in the synthesizer completion sequence can be determined based on the first order and the second order, and then according to the at least one synthesizer
- the interface atom of each synthesizer completion action instruction in the completion action is added to the basic diagram of each synthesizer completion action instruction.
- the first order and the second order may be the same or different.
- At least one synthesizer completion corresponding to each synthesizer in the synthesizer completion sequence may be determined based on the first order, the second order, and the number of at least one synthesizer completion action corresponding to each synthesizer. action, and further add the basic graph of each synthesizer completion action instruction according to the interface atom of each synthesizer completion action instruction in the at least one synthesizer completion action.
- the traversal order of the multiple synthesizers used when predicting the synthesizer completion sequence can be followed by the reverse reaction prediction model, and each of the multiple synthesizers can be sequentially processed.
- the interface atom of each synthesizer completion action instruction in the preset number of synthesizer completion actions corresponding to each synthesizer the basic diagram of each synthesizer completion action instruction is added.
- the plurality of synthesizers include synthesizer 1 and synthesizer 2
- the first synthesizer completion action in the synthesizer completion sequence is the synthesizer completion action for synthesizer 1
- the synthesizer completion action The remaining synthesizer completion actions in the entire sequence except the first synthesizer completion action are the synthesizer completion actions for synthesizer 2.
- Atom add the basic diagram indicated by the first synthon completion action to obtain multiple reactant molecules corresponding to the synthon 1; further, you can add the interface atoms indicated by the remaining synthon completion actions in sequence.
- the remaining synthons complete the basic diagram of the action instructions, and a plurality of reactant molecules corresponding to the synthon 2 are obtained.
- the interface atom indicated by each synthesizer completion action is: when using the basic graph indicated by each synthesizer completion action to perform synth completion, the action on the basic graph indicated by each synthesizer completion action Atoms that connect nodes.
- the interface atom indicated by the synthesizer completion action and the attachment atom for the synthesizer completion action can be used as connection nodes to obtain the Multiple reactant molecules corresponding to product molecules. It is worth noting that the interface atom indicated by the synthon completion action and the attached atom for the synthon completion action are the same atom in the reactant molecule.
- the attached atoms involved in the embodiments of the present application include atoms selected as the editing object and atoms at both ends of the chemical bond selected as the editing object.
- the synthesizer completion action is characterized by the following tags: an action tag indicating that synthon completion is performed, a tag indicating a base graph, and a pair indicating a tag for an interface atom.
- a synth completion action in a synth completion sequence can be defined as a synth completion triplet, that is, ( ⁇ 3, z, q).
- ⁇ 3 indicates that the action predicted by the reverse reaction prediction model is the synthon completion action.
- z represents the label of the base graph indicated by the synthesizer completion action predicted by the inverse reaction prediction model.
- q represents the label of the interface atom corresponding to the synthon completion action predicted by the reverse reaction prediction model. For example, ( ⁇ 3,z1,q1).
- ⁇ 3 indicates that the action predicted by the reverse reaction prediction model is the synthon completion action.
- z1 represents the label of the basic graph indicated by the synthesizer completion action predicted by the reverse reaction prediction model, that is, the basic graph labeled z1.
- q1 represents the label of the interface atom corresponding to the synthon completion action predicted by the reverse reaction prediction model, that is, the atom labeled q1 in the basic graph labeled z1. Based on this, based on ( ⁇ 3, z1, q1), the atom labeled q1 in the basic graph labeled z1 can be used as the interface atom, and the basic graph of z1 can be used for synthon completion.
- the synthesizer completion action may be defined as a tuple or a value in other formats.
- the synthesizer completion action can be defined as a tuple, it can be specifically defined as a label used to indicate the basic graph and a pair used to indicate the label used for the interface atom.
- the prediction task of the synthon and the prediction task of the synthon completion can be combined, that is, the implementation of this application
- the reverse reaction prediction model introduced in the example can learn the potential relationship between the two subtasks of synthon prediction and synthon completion, thereby greatly improving the generalization performance of the model, reducing the prediction complexity of reactant molecules and improving reactions.
- the generalization performance of material molecule prediction in addition, by introducing a basic graph and designing the basic graph as a structure that includes multiple atoms or atomic edges used to connect the multiple atoms, a short and accurate conversion path can be reasonably constructed. This avoids the excessive length of the synthon completion sequence, reduces the difficulty of predicting reactant molecules and improves the prediction accuracy of reactant molecules, thereby improving the prediction performance of reactant molecules.
- this S220 may include:
- the input feature of the t-th action is obtained, and t is an integer greater than 1; based on the input feature corresponding to the t-th action and the input feature corresponding to the t-th action Hidden features and predict the t-th action until the action predicted by the reverse reaction prediction model is a synthon completion action, and all attached atoms on the multiple synthons have been traversed and all the actions for the multiple synthons have been traversed. When all attached atoms on the basic graph are added, the conversion path is obtained; where the hidden features of the t actions are related to the actions predicted by the reverse reaction prediction model before the t-th action.
- the attached atoms include atoms selected as the editing object and atoms at both ends of the chemical bond selected as the editing object.
- the reverse reaction prediction model can be a Recurrent Neural Network (RNN).
- RNN Recurrent Neural Network
- the t-1th action can correspond to time t-1
- the tth action can correspond to time t; also That is to say, based on the predicted action of the RNN at time t-1, the input features of the RNN at time t are obtained, and based on the input features of the RNN at time t and the hidden features of the RNN at time t, the The RNN predicts the action at time t until the action predicted by the RNN is a synth completion action and all attached atoms on the multiple synths and the basic graph added for the multiple synths have been traversed. When all atoms are attached, the transformation path is obtained.
- RNN Recurrent Neural Network
- multi-layer perceptrons and convolutional neural networks assume that the input is an independent unit without context. For example, the input is a picture, and the network identifies whether it is a dog or a cat. But for some serialized inputs with obvious contextual features, such as predicting the playback content of the next frame in a video, it is obvious that such output must rely on previous inputs, which means that the network must have a certain memory capability. RNN can just give the network this kind of memory ability.
- the reverse reaction prediction model can also be other models, which is not specifically limited in this application.
- the output feature ut can be obtained based on the input feature corresponding to the t-th action and the hidden feature corresponding to the t- th action, and then the output feature ut and The characteristics h G of the product molecules are spliced together to obtain the characteristics ⁇ t used to identify the t-th action.
- GRU( ⁇ ) is a gated recursive unit
- input t-1 and hidden t-1 are the input features used by GRU to predict the t-th action (i.e., the edited intermediate molecular fragment based on the t-1-th action). characteristics) and the hidden state passed down from the node used to predict the t-th action, whose initial values are the 0 vector and the characteristics of the product molecule graph respectively
- ⁇ G ( ⁇ ) is the embedding function of the characteristics of the product molecule .
- the reverse reaction prediction model can identify the t-th action using the following formula:
- softmax( ⁇ ) is the classification function
- MLP bond ( ⁇ ) represents the multi-layer perceptron
- ⁇ t represents the feature used to identify the t-th action.
- the output features ut and the features h G of the product molecules are spliced, that is, the features of the product molecules and the output features of the RNN are spliced together for subsequent action prediction, which can integrate global topological information. into the action prediction process, thereby improving the accuracy of action prediction.
- the conversion path may be determined as follows:
- the beam search method of hyperparameter k obtain the k first prediction results with the highest score from the prediction results of the t-1th action; based on the k first prediction results, determine the k corresponding to the t-th action The first input feature; further, based on each of the k first input features and the hidden feature corresponding to the t-th action, Predict the t-th action, and obtain the k second prediction results with the highest scores from the predicted prediction results according to the beam search method of hyperparameter k; based on the k second prediction results, determine the t+th k second input features corresponding to 1 action; based on each of the k second input features and the hidden features corresponding to the t+1th action, predict the t+1th action , until the action predicted by the reverse reaction prediction model is a synthon completion action, and all attached atoms on the multiple synthons and all attached atoms on the basic graph added for the multiple synthons have been traversed, Get the conversion path.
- the prediction device predicts the t-th action based on each of the k first input features and the hidden feature corresponding to the t-th action, and obtains 2k prediction results
- the k second prediction results with the highest scores can be obtained from the 2k prediction results, and determined as the k second input features corresponding to the t+1th action.
- the prediction results of the t-1th action when obtaining the k first prediction results with the highest scores from the prediction results of the t-1th action, the prediction results of the t-1th action can be obtained first Sort according to the score, and then use the k prediction results with the highest score among the prediction results of the t-1th action as the k first prediction results. For example, when sorting the prediction results of the t-1th action according to the score, you can first accumulate and calculate the scores of all prediction results that have been predicted on the path where each prediction result is located, and obtain each prediction result. The corresponding cumulative sum of scores is then used as the k first prediction results among the prediction results of the t-1th action with the highest cumulative sum score.
- Beam Search with hyperparameter k is used to select multiple alternatives for the input sequence at each time step based on conditional probabilities.
- the number of multiple alternatives depends on a hyperparameter k called beam width.
- beam search selects the k best alternatives with the highest probability as the most likely choices at the current moment. That is, at each moment, the best result k with the highest score is selected as the input of the next moment based on the log-likelihood scoring function.
- this process can be described as the construction of a search tree, in which the leaf node with the highest score is expanded with its children, while other leaf nodes are deleted.
- the input features of the t-th action are determined based on the features of the subgraph edited using the t-1th action, and based on The input features of the t-th action and the hidden features of the t-th action are used to predict the editing object and the edited state indicated by the t-th action using the reverse reaction prediction model until the reverse reaction prediction model is used to predict When the obtained action is the last editing action in the editing sequence, the editing sequence is obtained.
- the last editing action is an editing completion action.
- the t-1th action is the editing action
- the input features of the t-th action are determined based on the features of the subgraph edited using the t-1th action, and based on the t-1th action
- the reverse reaction prediction model is used to predict the editing object and the edited state indicated by the t-th action until the action predicted by the reverse reaction prediction model is When the editing completes the action, the editing sequence is obtained.
- t is an integer greater than 1 or greater than 2.
- the t-1th action is the first editing action
- the t-1th action is the starting action.
- the characteristics of the subgraph obtained by editing the t-1th action are products. Characteristics of molecules.
- the intermediate molecule fragment processed based on the t-1th action can be decoded to obtain the intermediate molecule fragment edited based on the t-1th action.
- Characteristics when the reverse reaction prediction model predicts the t-th action, the intermediate molecule fragment processed based on the t-1th action can be decoded to obtain the intermediate molecule fragment edited based on the t-1th action.
- the reverse reaction prediction model may assign a score based on each chemical bond and atom. Predict the editing object corresponding to the t-th action.
- the reverse reaction prediction model predicts that when predicting the editing object corresponding to the t-th action, each chemical bond and atom can first be assigned a score.
- This score represents the probability that a chemical bond or atom is considered an edit object in the t-1th action; a score is then assigned to each chemical bond and atom. Predict the editing object corresponding to the t-th action.
- the reverse reaction prediction model can assign a score to each chemical bond and atom through the following formula:
- sigmoid( ⁇ ) is the logistic regression function
- MLP target ( ⁇ ) represents the feature output by the multi-layer perceptron for determining the score of the i-th chemical bond or atom
- ⁇ t represents the Identify the characteristics of the t-th action
- ⁇ e ( ⁇ ) represents the embedding function of the characteristics of the atom or chemical bond
- e i represents the i-th chemical bond or atom.
- the reverse reaction prediction model predicts the edited state for the editing object corresponding to the t-th action.
- the reverse reaction prediction model can predict the post-edited state of the editing object corresponding to the t-th action through the following formula:
- softmax( ⁇ ) is the classification function
- MLP type ( ⁇ ) represents the features output by the multi-layer perceptron for determining the type of chemical bond
- ⁇ t represents Used to identify the features of the t-th action
- ⁇ e ( ⁇ ) represents the embedding function of the features of atoms or chemical bonds
- argmax ( ⁇ ) represents finding the atom or chemical bond with the largest score
- the reverse reaction prediction model applies the predicted editing object corresponding to the t-th action and the edited state corresponding to the t-th action to the pre-editing intermediate molecule fragment corresponding to the t-th action.
- obtain the edited intermediate molecule segment corresponding to the t-th action and then use MPNN( ⁇ ) to calculate the characteristics of the edited intermediate molecule segment corresponding to the t-th action.
- MPNN( ⁇ ) to calculate the characteristics of the edited intermediate molecule segment corresponding to the t-th action
- the editing object corresponding to the t-th action and the edited state corresponding to the t-th action are used to obtain the input feature input t corresponding to the t+1-th action.
- the input feature input t corresponding to the t+1th action can be obtained according to the following formula:
- ⁇ e ( ⁇ ) represents the embedding function of the characteristics of atoms or chemical bonds
- argmax ( ⁇ ) represents finding the atom or chemical bond with the maximum score
- ⁇ b ( ⁇ ) indicates embedding function.
- the t-1th action is the last editing action in the editing sequence or the synthesized sub-completion action, then based on the features of the subgraph edited using the t-1th action and The characteristics of the attached atom corresponding to the t-1th action determine the input features of the t-th action, and based on the input features of the t-th action and the hidden features of the t-th action, determine the input features of the t-th action.
- the indicated basic graph and interface atoms are predicted; until the action predicted by the reverse reaction prediction model is a synthesizer completion action, and the multiple synthons and basic graphs added for the multiple synthons have been traversed When all the attached atoms are , the completion sequence of the synthon is obtained.
- the last editing action is an editing completion action.
- the t-1th action is the editing completion action or the composite sub-completion action
- the t-1th action based on the characteristics of the subgraph edited using the t-1th action and the attachment corresponding to the t-1th action Characteristics of atoms, determine the input features of the t-th action, and predict the basic graph and interface atoms indicated by the t-th action based on the input features of the t-th action and the hidden features of the t-th action;
- the action predicted by the reverse reaction prediction model is a synthon completion action, and all attached atoms on the multiple synthons and on the basic graph added for the multiple synthons have been traversed, the synthon is obtained Complete the sequence.
- the action predicted by the reverse reaction prediction model is the last editing action, it means that the synthesizer prediction stage ends, and at the same time, the reactant prediction process enters the synthesizer completion stage. At this point, all attached atoms are sorted according to their label order in the product molecule for synthon completion.
- ⁇ atom ( ⁇ ) represents the embedding function of the characteristics of the attached atom
- a t represents the attached atom corresponding to the t-th action.
- the reverse reaction prediction model sequentially traverses all attached atoms on the synthon and all attached atoms on the added basic graph, and assigns a basic graph (motif) to each attached atom.
- the prediction of base graphs can be viewed as a multi-classification task in a pre-stored dictionary Z. Obtain base map through base map prediction After that, the basic graph can be further determined on the interface atom corresponding to the attached atom a t
- the reverse reaction prediction model can predict the basic graph corresponding to the t-th action according to the following formula
- MLP motif ( ⁇ ) represents the features output by the multi-layer perceptron for predicting the basic graph corresponding to the t-th action
- ⁇ t represents the feature used for recognition Characteristics of the tth action.
- the reverse reaction prediction model can predict the basic graph corresponding to the t-th action according to the following formula on the interface atom corresponding to the attached atom a t
- ⁇ t represents the features used to identify the t-th action
- ⁇ z ( ⁇ ) represents the basic graph
- the embedding function of the feature Represents the basic graph corresponding to the t-th action.
- the t-1th action is the last editing action or the synthesizer completion action, and the interface atom of the basic graph corresponding to the t-1th action only includes one interface atom, then based on the use of the t-1th
- the characteristics of the subgraph obtained by editing the t-1 action and the characteristics of the attached atoms corresponding to the t-1th action determine the input features of the t-th action.
- ⁇ atom ( ⁇ ) represents the embedding function of the characteristics of the attached atom
- a t represents the attached atom corresponding to the t-th action.
- the t-1th action is the last editing action or the synthesizer completion action, and the interface atoms of the basic graph corresponding to the t-1th action only include multiple interface atoms, then based on the use of the t-th
- the characteristics of the subgraph obtained by editing one action, the characteristics of the basic graph corresponding to the t-1th action, and the characteristics of the attached atoms corresponding to the t-1th action determine the input features of the tth action.
- the input feature input t corresponding to the t+1th action can be determined according to the following formula:
- ⁇ z ( ⁇ ) represents the basic graph
- the embedding function of the feature represents the basic graph corresponding to the t-th action
- ⁇ atom ( ⁇ ) represents the embedding function of the characteristics of the attached atom
- a t represents the attached atom corresponding to the t-th action.
- the conversion path can be obtained. From this, the conversion path can be applied to the product molecular graph to obtain the reactant molecular graph.
- Figure 3 is an example of a conversion path provided by an embodiment of the present application.
- the action predicted by the prediction model is the synthon completion action
- a1 ⁇ a3 represents the attachment atom
- q1 ⁇ q4 represents the interface atom
- z1 ⁇ z3 represents the basic graph
- b represents the editing action
- the corresponding editing object is the chemical bond labeled b
- None (none) indicates that the edited state corresponding to this editing action is to delete the chemical bond labeled b.
- Attached atoms are an interface for adding basic patterns (motifs).
- the label of the first edit action predicted is the triplet ( ⁇ 1,b,none), and then the triplet ( ⁇ 1,b,none) is used as the input
- the predicted label of the second action is ⁇ 2; then, sort all the attached atoms according to the order of their labels in the product molecule, and input the binary pairs ( ⁇ 3,a1), ( ⁇ 3,a3), ( ⁇ 3,a3), the labels of the actions predicted in sequence are triplet ( ⁇ 3,z1,q1), triplet ( ⁇ 3,z2,q2), triplet ( ⁇ 3,z3,q4), based on this, we get
- the conversion path can be defined as the path shown in (b) of Figure 3: triplet ( ⁇ 1, b, none), ⁇ 2, triplet ( ⁇ 3, z1, q1), triplet ( ⁇ 3, z2, q2 ), triplet ( ⁇ 3, z3, q4), thus, by applying the conversion path to the product molecule, the reactant molecule shown on the right side of Figure 3
- the editing sequence only includes one editing action, which is defined as the triplet ( ⁇ 1, b, none).
- the synthesizer completion sequence includes three synthesizer completion actions, which are respectively defined as triplet ( ⁇ 3, z1, q1), triplet ( ⁇ 3, z2, q2), and triplet ( ⁇ 3, z3, q4) .
- the interface atom indicated by the synthesizer completion action and the attachment atom for the synthesizer completion action can be used as connection nodes. It is worth noting that the interface atom indicated by the synthon completion action and the attached atom for the synthon completion action are the same atom in the reactant molecule. For example, when adding z1 based on the triplet ( ⁇ 3,z1,q1), a1 and q1 are the same atom (i.e. N atom) in the reactant molecule; similarly, adding z1 based on the triplet ( ⁇ 3,z2,q2) When z2, a2 and q2 are the same atom (i.e.
- Figure 3 is only an example of the present application and should not be understood as a limitation of the present application.
- the conversion path may also include other numbers of editing actions or synthon completion actions, or even other numbers of synthons or reactant molecules, which are not specifically limited in the embodiments of the present application.
- Figure 4 is another schematic flow chart of the prediction method of reactant molecules provided by the embodiment of the present application.
- ⁇ 1 means that the action predicted by the reverse reaction prediction model is the editing action
- ⁇ 2 means that the action predicted by the reverse reaction prediction model is the editing completion action
- ⁇ 3 means that the action predicted by the reverse reaction prediction model is the synthesizer completion action
- a1 ⁇ a3 means attachment.
- Atoms, q1 to q3 represent interface atoms
- z1 to z3 represent basic graphs
- g represents that the editing object corresponding to the editing action is the chemical bond with the label g
- none (none) represents that the edited state corresponding to the editing action is deleted with the label g of chemical bonds.
- Attached atoms are an interface for adding basic patterns (motifs).
- the label of the first edit action predicted is the triplet ( ⁇ 1,g,none), and then the triplet ( ⁇ 1,g,none) is used as the input
- the predicted label of the second action is ⁇ 2; then, sort all the attached atoms according to the order of their labels in the product molecule, and input the binary pairs ( ⁇ 3,a1), ( ⁇ 3,a3), ( ⁇ 3,a3), the labels of the actions predicted in sequence are triplet ( ⁇ 3,z1,q1), triplet ( ⁇ 3,z2,q2), triplet ( ⁇ 3,z3,q4), based on this, we get
- the conversion path can be defined as the path shown in (b) of Figure 3: triplet ( ⁇ 1, b, none), ⁇ 2, triplet ( ⁇ 3, z1, q1), triplet ( ⁇ 3, z2, q2 ), triplet ( ⁇ 3, z3, q4), among which, a1 ⁇ a3, q1 ⁇ q3, z1 ⁇ z3 are shown in the figure
- the editing sequence only includes one editing action, which is defined as the triplet ( ⁇ 1, g, none).
- the synthesizer completion sequence includes three synthesizer completion actions, which are respectively defined as triplet ( ⁇ 3, z1, q1), triplet ( ⁇ 3, z2, q2), and triplet ( ⁇ 3, z3, q4) .
- the interface atom indicated by the synthesizer completion action and the attachment atom for the synthesizer completion action can be used as connection nodes.
- the interface atoms indicated by the synthesizer completion action and the The attached atom of the sub-completion action is the same atom in the reactant molecule.
- the reactant prediction process includes an editing stage and a basic diagram adding stage.
- the editing stage describes the bond and atomic changes from the product to the synthon, that is, the prediction process of the synthon, while the basic diagram adding stage adds appropriate components to the synthon.
- the basic diagram completes the generation of reactants.
- the input molecular graph is first encoded by the graph neural network (GNN) to obtain the output of the GNN; then, if the t-th action is an editing action, the recurrent neural network (RNN) is based on the t-th action corresponding to The output of GNN and the hidden state output by the previous node are used for action prediction; if the t-th action is an editing completion action or a synthetic sub-completion action, RNN is based on the output of GNN corresponding to the t-th action and the attachment corresponding to the t-th action. Atoms and the hidden state output by the previous node are used for action prediction.
- GNN graph neural network
- RNN gradually predicts the editing sequence until the editing completion action is predicted, ends the editing phase and starts the basic graph adding phase.
- RNN adds basic graphs in sequence until all attached atoms are traversed.
- the interface atoms (q1, q2, and q3) in the base graph and the attached atoms (a1, a2, and a3) in the synth/intermediate represent the same atom.
- attaching the base graph to the composition When placed on a sub/intermediate, it merges them into a single atom.
- a1 and q1 are the same atom (i.e., S atom) in the reactant molecule; similarly, adding z1 based on the triplet ( ⁇ 3, z2, q2)
- z2, a2 and q2 are the same atom (i.e. O atom) in the reactant molecule; after adding z2, the interface atom q3 becomes the attached atom a3.
- the triplet ⁇ 3, z3, q4
- the conversion path may not include a start action or an edit completion action.
- FIG. 5 shows a schematic flowchart of a training method 300 for a reverse reaction prediction model according to an embodiment of the present application.
- the training method 300 can be executed by any electronic device with data processing capabilities.
- the electronic device may be implemented as a server.
- the server can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers. It can also provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, and middleware. Cloud servers for basic cloud computing services such as software services, domain name services, security services, and big data and artificial intelligence platforms.
- the servers can be connected directly or indirectly through wired or wireless communication methods. This application is not limited here. For the convenience of description, the prediction method provided by this application is explained below by taking a training device for a prediction model of reactant molecules as an example.
- the training method 300 may include:
- the conversion path includes an editing sequence and a synthesizer completion sequence; each editing action in the editing sequence is used to indicate the editing object and the post-edited state, and the editing object is the atom or chemical bond in the product molecule; for utilizing the A plurality of synthons of the product molecule obtained by editing the sequence.
- the synthon completion sequence includes at least one synthon completion action corresponding to each of the multiple synthons. In the at least one synthon completion action, Each synthon completion action of is used to indicate a basic graph including a plurality of atoms or atomic edges connecting the plurality of atoms and interface atoms;
- the reverse reaction prediction model can learn the potential relationship between the two subtasks of synthon prediction and synthon completion. This greatly improves the generalization performance of the model, reduces the prediction complexity of reactant molecules, and improves the generalization performance of reactant molecule prediction.
- the structure of the sub-edge can reasonably construct a short and accurate conversion path, avoid the excessive length of the synthon completion sequence, reduce the difficulty of predicting reactant molecules and improve the prediction accuracy of reactant molecules, thereby improving the reaction Predictive performance of molecules.
- the method 300 may further include:
- a candidate reactant molecule corresponding to the product molecule is obtained; by comparing the molecular structures of the product molecule and the candidate reactant molecule, a basic graph dictionary is obtained; based on the basic graph dictionary, the training path is obtained.
- the candidate reactant molecules may be all reactant molecules of the product molecule. That is, the candidate reactant molecule can be used to generate all reactant molecules of the product molecule.
- a connection tree is constructed based on the basic graph dictionary; the connection tree includes a tree structure with the plurality of synthesizers as root nodes and the basic graphs in the basic graph dictionary as child nodes; by traversing the connection tree method to determine the shortest path as the training path.
- connection tree represents the synthesizer and the basic pattern (motif) as a hierarchical tree structure, in which the synthon serves as the root node and the basic pattern (motif) serves as the child node.
- An edge between two nodes in a connection tree indicates that two subgraphs are directly connected in the reactant molecule graph, where a triplet of attachment atoms, motifs, and interface atoms can be used to represent each edge.
- connection tree a tree structure (ie, connection tree) is constructed to represent the connection relationship between the synthon and the basic graph.
- the connection tree can be used to provide an effective strategy for building training paths and reduce training complexity.
- a depth-first search can be used to traverse the entire connection tree, and the shortest path after traversal is determined as the training path.
- the depth-first traversal method can refer to traversing starting from a certain vertex v in the tree structure in the following way:
- depth-first search is to search the tree as "deeply" as possible. Its basic idea is: in order to find the solution to the problem, first choose a certain possible situation to explore forward (child nodes). During the exploration process, once it is found that the original choice does not meet the requirements, go back to the parent node and try again. Select another node and continue to explore forward, repeating this process until the optimal solution is obtained. In other words, the depth-first search starts from a vertex V0 and goes all the way along a road. If it is found that the target solution cannot be reached, then return to the previous node and then start from another road and go to the end. This way, try to go as deep as possible.
- the concept of walking is the concept of depth first.
- the molecular fragments in the candidate reactant other than multiple synthons are determined as multiple candidate subgraphs; if the first candidate subgraph among the multiple candidate subgraphs includes the first atom and second atom, and the first atom and the second atom belong to different rings, then break the chemical bond between the first atom and the second atom to obtain multiple first subgraphs; if the multiple candidate subgraphs
- the second candidate subgraph in the graph includes a connected third atom and a fourth atom, one of the third atom and the fourth atom belongs to a ring, and the other of the third atom and the fourth atom is greater than or equal to the preset value, then the chemical bond between the third atom and the fourth atom is broken, and multiple second subgraphs are obtained; the first candidate subgraph is removed from the multiple candidate subgraphs.
- the candidate subgraphs other than the second candidate subgraph, the plurality of first subgraphs and the plurality of second subgraphs are determined as basic graphs in the basic graph dictionary.
- the synthon can be reconstructed as Reactant molecular diagram.
- the motif can be viewed as a subgraph on the reactant molecular graph. Therefore, the embodiment of this application divides the basic pattern (motif) extraction process into the following steps:
- reactants corresponding to the synthon may include molecules or reactants containing the synthon.
- the ring involved in the embodiments of the present application may be a single ring, that is, there is only one ring in the molecule.
- the rings involved in the embodiments of the present application may be cycloalkanes, and cycloalkanes may be classified based on the number of carbon atoms on the ring.
- the number of carbon atoms in the ring is 3 to 4, it is called a small ring, when the number of carbon atoms in the ring is 5 to 6, it is called an ordinary ring, when the number of carbon atoms in the ring is 7 to 12, it is called a medium ring, and when the number of carbon atoms in the ring is 7 to 12, it is called a medium ring.
- the number of carbon atoms is greater than 12, it is called a macrocycle.
- a dictionary Z with a preset number of basic patterns (motifs) can be extracted.
- a dictionary Z with a basic pattern number of 210 can be extracted.
- the method 300 may further include:
- the difference between the label of the predicted action in the conversion path and the label of the training action in the training path, the difference between the score of the editing object corresponding to the predicted action and the score of the editing object corresponding to the training action, the difference between the predicted action The difference between the corresponding edited state and the edited state corresponding to the training action, the difference between the basic graph indicated by the predicted action and the basic graph indicated by the training action, the interface atom indicated by the predicted action and the Differences between interface atoms indicated by training actions.
- the reverse reaction prediction is modeled as a molecule generation problem based on autoregression. That is, given a product molecule G P , for each step t, the autoregressive model obtains a new graph structure G t based on the historical graph structure.
- the generation process of reactant molecular graphs can be defined as the following joint probability likelihood function:
- G R represents the reactant molecule
- N is the length of the conversion path
- G t is the intermediate molecular fragment corresponding to the t-th action
- G 0 G P .
- the intermediate molecular fragment G t is not directly generated by the reverse reaction prediction model, but the reverse reaction prediction model generates new graph editing actions, editing objects (i.e. chemical bonds, atoms or basic graphs) and their edited ones based on historical actions. state (that is, a new chemical bond type or interface atom), and then apply it to the intermediate molecular fragment in the previous step to obtain a new intermediate molecular fragment.
- editing objects i.e. chemical bonds, atoms or basic graphs
- state that is, a new chemical bond type or interface atom
- a cross-entropy loss can be used to optimize the difference between the label of the predicted action in the conversion path and the label of the training action in the training path, the difference between the edited state corresponding to the predicted action and the edited state corresponding to the training action, the The difference between the base graph indicated by the predicted action and the base graph indicated by the training action, the difference between the interface atom indicated by the predicted action and the interface atom indicated by the training action, using binary cross-entropy loss Optimize the difference between the score of the editing object corresponding to the predicted action and the score of the editing object corresponding to the training action.
- the loss between this conversion path and the training path can be determined using the following formula:
- N 1 represents the length of the editing sequence or the length of the sequence formed by the editing sequence and the editing completion action
- N 2 represents the length of the editing sequence and the synthesizer completion sequence.
- S320 in the training method 300 may refer to the implementation of S220 in the prediction method 200. To avoid duplication, it will not be described again here.
- the teacher-forcing strategy can be used to train the model.
- RNN has two training modes, namely free-running mode and teacher-forcing mode.
- Free-running mode refers to the output of the previous state as the input of the next state.
- the working principle of the teacher forcing mode is: at time t of the training process, use the expected output or actual output y(t) of the training data set as the input x(t+1) of the next time step, instead of using the output generated by the model h(t).
- the hyperparameters in the reverse reaction prediction model can be adjusted, such as the number of layers of GNN and GRU.
- the size of the sequence numbers of the above-mentioned processes does not mean the order of execution.
- the execution order of each process should be determined by its functions and internal logic, and should not be used in this application.
- the implementation of the examples does not constitute any limitations.
- FIG. 6 is a schematic block diagram of a reactant molecule prediction device 400 provided by an embodiment of the present application.
- the predicting device 400 of reactant molecules may include:
- the extraction unit 410 is used to extract features of product molecules to obtain the characteristics of the product molecules;
- the prediction unit 420 is used to predict a conversion path from the product molecule to multiple reactant molecules based on the characteristics of the product molecule and using a reverse reaction prediction model; the conversion path includes an editing sequence and a synthon completion sequence; the editing The sequence is a sequence formed by an editing action, and the synthesizer completion sequence is a sequence formed by a synthesizer completion action;
- the editing unit 430 is configured to edit the editing objects indicated by each editing action in the editing sequence according to the edited status of each editing action indication, and obtain multiple synthons corresponding to the product molecule, and the editing object is Atoms or chemical bonds in a product molecule;
- the adding unit 440 is configured to, for each synthesizer in the plurality of synthesizers, based on at least one synthesizer completion action corresponding to the respective synthesizer in the synthesizer completion sequence, according to the at least one synthesizer completion action
- the interface atoms of each synthon completion action instruction in are added to the basic diagram of each synthon completion action instruction to obtain multiple reactant molecules corresponding to the multiple synthons.
- the basic diagram includes multiple atoms or uses on the atomic edges connecting the multiple atoms.
- the prediction unit 420 is specifically used to:
- the input feature of the tth action is obtained, where t is an integer greater than 1;
- the t-th action is predicted until the action predicted by the reverse reaction prediction model is a synthesizer completion action and has been traversed When all attached atoms on the multiple synthons and all attached atoms on the basic graph added for the multiple synthons are completed, the conversion path is obtained;
- the hidden features of the t actions are the same as the actions predicted by the reverse reaction prediction model before the t-th action.
- the prediction unit 420 is specifically used to:
- the k first prediction results with the highest score are obtained from the prediction results of the t-1th action
- the conversion path should be obtained, including:
- the t-th action is predicted, and according to the beam search method of the hyperparameter k, the predicted Obtain the k second prediction results with the highest scores among the 2k prediction results;
- the t+1th action is predicted until the action predicted by the reverse reaction prediction model is The conversion path is obtained when the synthesizer completion action is completed and all attached atoms on the multiple synthons and all attached atoms on the basic graph added for the multiple synthons have been traversed.
- the prediction unit 420 is specifically used to:
- the t-1th action is an editing action
- determine the input features of the t-th action and based on the characteristics of the t-th action Input features and the hidden features of the t-th action, and use the reverse reaction prediction model to predict the editing object and the edited state indicated by the t-th action until the action predicted by the reverse reaction prediction model is the edit.
- the optimal editing action in the sequence is performed, the editing sequence is obtained;
- the t-1th action is the last editing action or the composite sub-completion action, then based on the characteristics of the subgraph edited using the t-1th action and the t-1th action corresponding Attach the characteristics of the atoms, determine the input characteristics of the t-th action, and predict the basic graph and interface atoms indicated by the t-th action based on the input characteristics of the t-th action and the hidden features of the t-th action. ; Until the action predicted by the reverse reaction prediction model is a synthon completion action, and all attached atoms on the multiple synthons and on the basic graph added for the multiple synthons have been traversed, the synthesis is obtained Subcompletion sequence.
- the editing action is represented by the following tags: an action tag used to indicate editing, a tag used to indicate the editing object, and a tag used to indicate the edited state;
- the synthesizer completion action is represented by the following tags Representations: Action tags indicating synthon completion, tags indicating base graphs, and tags indicating the atoms used for interfaces.
- the edited state indicated by the editing action is to change the number of charges on the atom or change the number of hydrogen atoms on the atom; if the editing action indicates If the indicated editing object is a chemical bond, then the edited state indicated by the editing action is any of the following: adding a chemical bond, deleting a chemical bond, or changing the type of a chemical bond.
- Figure 7 is a schematic block diagram of the training device 500 of the reverse reaction prediction model provided by the embodiment of the present application.
- the training device 500 of the reverse reaction prediction model may include:
- the extraction unit 510 is used to extract features of product molecules to obtain the characteristics of the product molecules
- the prediction unit 520 is used to predict the conversion path between the product molecule and multiple reactant molecules based on the characteristics of the product molecule and using a reverse reaction prediction model;
- the conversion path includes an editing sequence and a synthesizer completion sequence; each editing action in the editing sequence is used to indicate the editing object and the post-edited state, and the editing object is the atom or chemical bond in the product molecule; for utilizing the A plurality of synthons of the product molecule obtained by editing the sequence.
- the synthon completion sequence includes at least one synthon completion action corresponding to each of the multiple synthons. In the at least one synthon completion action, Each synthon completion action of is used to indicate a basic graph including a plurality of atoms or atomic edges connecting the plurality of atoms and interface atoms;
- the training unit 530 is used to train the reverse reaction prediction model based on the loss between the conversion path and the training path.
- the prediction unit 520 before the prediction unit 520 is used to obtain the conversion path, it is used to:
- the training path is obtained.
- the prediction unit 520 is specifically used to:
- connection tree based on the basic graph dictionary;
- the connection tree includes a tree structure with the plurality of synthons as root nodes and the basic graphs in the basic graph dictionary as child nodes;
- the shortest path is determined as the training path.
- the prediction unit 520 is specifically used to:
- the first candidate subgraph among the plurality of candidate subgraphs includes a first atom and a second atom, and the first atom and the second atom belong to different rings, disconnect the first atom and the second atom. chemical bonds between them, and multiple first subgraphs are obtained;
- a second candidate subgraph among the plurality of candidate subgraphs includes a connected third atom and a fourth atom, one of the third atom and the fourth atom belongs to a ring, and the third atom and the fourth atom If the degree of another atom in the atom is greater than or equal to the preset value, the chemical bond between the third atom and the fourth atom is broken to obtain multiple second subgraphs;
- the training unit 530 is used to train the inverse reaction prediction model based on the loss between the conversion path and the training path, and is also used to:
- the difference between the label of the predicted action in the conversion path and the label of the training action in the training path, the difference between the score of the editing object corresponding to the predicted action and the score of the editing object corresponding to the training action, the difference between the predicted action The difference between the corresponding edited state and the edited state corresponding to the training action, the difference between the basic graph indicated by the predicted action and the basic graph indicated by the training action, the interface atom indicated by the predicted action and the Differences between interface atoms indicated by training actions.
- the device embodiments and the method embodiments may correspond to each other, and similar descriptions may refer to the method embodiments. To avoid repetition, they will not be repeated here.
- the reactant molecule prediction device 400 may correspond to the corresponding subject in executing the method 200 of the embodiment of the present application, and each unit in the prediction device 400 is respectively intended to implement the corresponding process in the method 200; similarly, the The training device 500 of the reverse reaction prediction model may correspond to the corresponding subject in executing the method 300 of the embodiment of the present application, and each unit in the training device 500 is respectively intended to implement the corresponding process in the method 300; for the sake of brevity, no further details will be given here. .
- each unit in the prediction device 400 or the training device 500 involved in the embodiment of the present application can be separately or entirely combined into one or several other units, or some of the units can be further disassembled. It is divided into multiple units with smaller functions, which can achieve the same operation without affecting the realization of the technical effects of the embodiments of the present application.
- the above units are divided based on logical functions. In practical applications, the function of one unit can also be realized by multiple units, or the functions of multiple units can be realized by one unit. In other embodiments of the present application, the prediction device 400 or the training device 500 may also include other units. In practical applications, these functions may also be implemented with the assistance of other units, and may be implemented by multiple units in cooperation.
- a general-purpose computing device including a general-purpose computer including processing elements and storage elements such as a central processing unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), etc.
- CPU central processing unit
- RAM random access storage medium
- ROM read-only storage medium
- Run a computer program capable of executing each step involved in the corresponding method to construct the prediction device 400 or the training device 500 involved in the embodiment of the present application, and implement the method of the embodiment of the present application.
- the computer program can be recorded on, for example, a computer-readable storage medium, loaded into an electronic device through the computer-readable storage medium, and run therein to realize the implementation of the present application. Corresponding methods of the examples.
- the units mentioned above can be implemented in the form of hardware, can also be implemented in the form of instructions in the form of software, or can be implemented in the form of a combination of software and hardware.
- each step of the method embodiments in the embodiments of the present application can be completed by integrated logic circuits of hardware in the processor and/or instructions in the form of software.
- the steps of the methods disclosed in conjunction with the embodiments of the present application can be directly embodied in hardware.
- the execution of the decoding processor is completed, or the execution is completed using a combination of hardware and software in the decoding processor.
- the software can be located in a mature storage medium in this field such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, register, etc.
- the storage medium is located in the memory, and the processor reads the information in the memory and completes the steps in the above method embodiment in combination with its hardware.
- FIG. 8 is a schematic structural diagram of an electronic device 600 provided by an embodiment of the present application.
- the electronic device 600 at least includes a processor 610 and a computer-readable storage medium 620 .
- the processor 610 and the computer-readable storage medium 620 may be connected through a bus or other means.
- the computer-readable storage medium 620 is used to store a computer program 621.
- the computer program 621 includes computer instructions.
- the processor 610 is used to execute the computer instructions stored in the computer-readable storage medium 620.
- the processor 610 is the computing core and the control core of the electronic device 600. It is suitable for implementing one or more computer instructions. Specifically, it is suitable for loading and executing one or more computer instructions to implement the corresponding method flow or corresponding functions.
- the processor 610 may also be called a central processing unit (Central Processing Unit, CPU).
- the processor 610 may include, but is not limited to: a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a field programmable gate array (Field Programmable Gate Array, FPGA) Or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
- DSP Digital Signal Processor
- ASIC Application Specific Integrated Circuit
- FPGA Field Programmable Gate Array
- the computer-readable storage medium 620 can be a high-speed RAM memory, or a non-volatile memory (Non-Volatile Memory), such as at least one disk memory; optionally, it can also be at least one located far away from the aforementioned processor 610 Computer-readable storage media.
- the computer-readable storage medium 620 includes, but is not limited to: volatile memory and/or non-volatile memory.
- non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electrically removable memory.
- Volatile memory may be Random Access Memory (RAM), which is used as an external cache.
- RAM Random Access Memory
- SRAM static random access memory
- DRAM dynamic random access memory
- DRAM synchronous dynamic random access memory
- DDR SDRAM double data rate synchronous dynamic random access memory
- Enhanced SDRAM, ESDRAM enhanced synchronous dynamic random access memory
- SLDRAM synchronous link dynamic random access memory
- Direct Rambus RAM Direct Rambus RAM
- the electronic device 600 may also include a transceiver 630 .
- the processor 610 can control the transceiver 630 to communicate with other devices. Specifically, it can send information or data to other devices, or receive information or data sent by other devices.
- Transceiver 630 may include a transmitter and a receiver.
- the transceiver 630 may further include an antenna, and the number of antennas may be one or more.
- bus system where in addition to the data bus, the bus system also includes a power bus, a control bus and a status signal bus.
- the electronic device 600 can be any electronic device with data processing capabilities; the first computer instructions are stored in the computer-readable storage medium 620; the computer-readable storage medium is loaded and executed by the processor 610 The first computer instructions stored in 620 are used to implement the corresponding steps in the method embodiment shown in Figure 1; in specific implementation, the first computer instructions in the computer-readable storage medium 620 are loaded by the processor 610 and execute the corresponding steps, as To avoid repetition, we will not go into details here.
- embodiments of the present application also provide a computer-readable storage medium (Memory).
- the computer-readable storage medium is a memory device in the electronic device 600 and is used to store programs and data.
- computer-readable storage medium 620 may include a built-in storage medium in the electronic device 600 , and of course may also include an extended storage medium supported by the electronic device 600 .
- the computer-readable storage medium provides storage space that stores the operating system of the electronic device 600 .
- one or more computer instructions suitable for being loaded and executed by the processor 610 are also stored in the storage space. These computer instructions may be one or more computer programs 621 (including program codes).
- embodiments of the present application further provide a computer program product or computer program.
- the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
- computer program 621 the data processing device 600 may be a computer.
- the processor 610 reads the computer instructions from the computer-readable storage medium 620, and the processor 610 executes the computer instructions, so that the computer executes the methods provided in the above various optional ways.
- the computer program product includes one or more computer instructions.
- the computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
- the computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transmitted from a website, computer, server, or data center to Transmission to another website, computer, server or data center via wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) methods.
- wired such as coaxial cable, optical fiber, digital subscriber line (DSL)
- wireless such as infrared, wireless, microwave, etc.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Crystallography & Structural Chemistry (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Pharmacology & Pharmacy (AREA)
- Medicinal Chemistry (AREA)
- Databases & Information Systems (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Analytical Chemistry (AREA)
- Organic Low-Molecular-Weight Compounds And Preparation Thereof (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
ut=GRU(inputt-1,hiddent-1),where input0=0,hidden0=σG(hG);
ψt=hG||ut。
inputt=hsyn+σatom(at),where at∈{a}。
inputt=hsyn+σatom(at),where at∈{a}。
Claims (17)
- 一种反应物分子的预测方法,其特征在于,包括:对产物分子进行特征提取,得到所述产物分子的特征;基于所述产物分子的特征,利用逆向反应预测模型,预测所述产物分子到多个反应物分子之间的转换路径,所述转换路径包括编辑序列和合成子补全序列;按照所述编辑序列中的各个编辑动作指示的编辑后的状态,对所述各个编辑动作指示的编辑对象进行编辑,得到所述产物分子对应的多个合成子,所述编辑对象为所述产物分子中的原子或化学键;针对所述多个合成子中的各个合成子,基于所述合成子补全序列中与所述各个合成子对应的至少一个合成子补全动作,按照所述至少一个合成子补全动作中的各个合成子补全动作指示的接口原子,添加所述各个合成子补全动作指示的基本图,得到与所述多个合成子对应的多个反应物分子,所述基本图包括多个原子或用于连接所述多个原子的原子边。
- 根据权利要求1所述的方法,其特征在于,所述基于所述产物分子的特征,利用逆向反应预测模型,预测所述产物分子到多个反应物分子之间的转换路径,包括:基于所述逆向反应预测模型预测得到的第t-1个动作,得到第t个动作的输入特征,t为大于1的整数;基于所述第t个动作对应的输入特征和所述第t个动作对应的隐藏特征,对所述第t个动作进行预测,直到所述逆向反应预测模型预测得到的动作为合成子补全动作、且已遍历完所述多个合成子上的所有附着原子和针对所述多个合成子添加的基本图上的所有附着原子时,得到所述转换路径;其中,所述t个动作的隐藏特征与所述逆向反应预测模型在所述第t个动作之前预测的动作相关。
- 根据权利要求2所述的方法,其特征在于,所述基于所述逆向反应预测模型预测得到的第t-1个动作,得到第t个动作的输入特征,包括:按照超参数k的波束搜索方式,从所述第t-1个动作的预测结果中获取评分最高的k个第一预测结果;基于所述k个第一预测结果,确定第t个动作对应的k个第一输入特征;其中,所述得到所述转换路径,包括:基于所述k个第一输入特征中的每一个第一输入特征和所述第t个动作对应的隐藏特征,对所述第t个动作进行预测,并按照超参数k的波束搜索方式,从预测得到的预测结果中获取评分最高的k个第二预测结果;基于所述k个第二预测结果,确定第t+1个动作对应的k个第二输入特征;基于所述k个第二输入特征中的每一个第二输入特征和第t+1个动作对应的隐藏特征,对所述第t+1个动作进行预测,直到所述逆向反应预测模型预测得到的动作为合成子补全动作、且已遍历完所述多个合成子上的所有附着原子和针对所述多个合成子添加的基本图上的所有附着原子时,得到所述转换路径。
- 根据权利要求2所述的方法,其特征在于,所述得到所述转换路径,包括:若所述第t-1个动作为所述编辑动作,则基于利用所述第t-1个动作进行编辑得到的子图的特征,确定所述第t个动作的输入特征;基于所述第t个动作的输入特征和所述第t个动作的隐藏特征,利用所述逆向反应预测模型对所述第t个动作指示的编辑对象和编辑后的状态进行预测,直到利用所述逆向反应预测模型预测得到的动作为所述编辑序列中的最后一个编辑动作时,得到所述编辑序列。
- 根据权利要求2所述的方法,其特征在于,所述得到所述转换路径,包括:若所述第t-1个动作为最后一个编辑动作或所述合成子补全动作,则基于利用所述第t-1个动作进行编辑得到的子图的特征和所述第t-1个动作对应的附着原子的特征,确定所 述第t个动作的输入特征;基于所述第t个动作的输入特征和所述第t个动作的隐藏特征,对所述第t个动作指示的基本图和接口原子进行预测;直到所述逆向反应预测模型预测得到的动作为合成子补全动作、且已遍历完所述多个合成子上的和针对所述多个合成子添加的基本图上的所有附着原子时,得到所述合成子补全序列。
- 根据权利要求1至5中任一项所述的方法,其特征在于,所述编辑动作通过以下标签表征:用于指示进行编辑的动作标签、用于指示编辑对象的标签以及用于指示编辑后的状态的标签;所述合成子补全动作通过以下标签表征:用于指示进行合成子补全的动作标签、用于指示基本图的标签以及对用于指示用于接口原子的标签。
- 根据权利要求1至6中任一项所述的方法,其特征在于,若所述编辑动作指示的编辑对象为原子,则所述编辑动作指示的编辑后的状态为改变原子上的电荷的数量或改变原子上的氢原子的数量;若所述编辑动作的指示的编辑对象为化学键,则所述编辑动作指示的编辑后的状态为以下中的任一种:添加化学键、删除化学键、改变化学键的类型。
- 一种逆向反应预测模型的训练方法,其特征在于,包括:对产物分子进行特征提取,得到所述产物分子的特征;基于所述产物分子的特征,利用逆向反应预测模型,预测所述产物分子到多个反应物分子之间的转换路径;其中,所述转换路径包括编辑序列和合成子补全序列;所述编辑序列中的各个编辑动作用于指示编辑对象和编辑后的状态,所述编辑对象为所述产物分子中的原子或化学键;针对利用所述编辑序列得到的所述产物分子的多个合成子,所述合成子补全序列包括与所述多个合成子中的各个合成子对应的至少一个合成子补全动作,所述至少一个合成子补全动作中的各个合成子补全动作用于指示基本图和接口原子,所述基本图包括多个原子或用于连接所述多个原子的原子边;基于所述转换路径和训练路径之间的损失,训练所述逆向反应预测模型。
- 根据权利要求8所述的方法,其特征在于,所述得到转换路径之前,所述方法还包括:获取所述产物分子对应的候选反应物分子;通过比较所述产物分子和所述候选反应物分子的分子结构,获取基本图词典;基于所述基本图词典,获取所述训练路径。
- 根据权利要求9所述的方法,其特征在于,所述基于所述基本图词典,获取所述训练路径,包括:基于所述基本图词典构建连接树;所述连接树包括以所述多个合成子为根节点以所述基本图词典中的基本图为子节点的树结构;通过遍历所述连接树的方式,将最短路径确定为所述训练路径。
- 根据权利要求9或10所述的方法,其特征在于,所述通过比较所述产物分子和所述候选反应物分子的分子结构,获取基本图词典,包括:将所述候选反应物中除多个合成子之外的分子片段,确定为多个候选子图;若所述多个候选子图中的第一候选子图包括第一原子和第二原子、且所述第一原子和所述第二原子属于不同的环,则断开所述第一原子和所述第二原子之间的化学键,得到多个第一子图;若所述多个候选子图中的第二候选子图包括相连的第三原子和第四原子、所述第三原子和所述第四原子中的一个原子属于环、且所述第三原子和所述第四原子中的另一个原子的度为大于或等于预设值,则断开所述第三原子和所述第四原子之间的化学键,得到多个第二子图;将所述多个候选子图中除所述第一候选子图和所述第二候选子图之外的候选子图、所 述多个第一子图和所述多个第二子图,确定为所述基本图词典中的基本图。
- 根据权利要求8至11中任一项所述的方法,其特征在于,所述基于所述转换路径和训练路径之间的损失,训练所述逆向反应预测模型之前,所述方法还包括:基于以下信息确定所述转换路径和训练路径之间的损失:所述转换路径中的预测动作的标签和训练路径中的训练动作的标签之间的差异、所述预测动作对应的编辑对象的得分和所述训练动作对应的编辑对象的得分之间的差异、所述预测动作对应的编辑后的状态和所述训练动作对应的编辑后的状态之间的差异、所述预测动作指示的基本图和所述训练动作指示的基本图之间的差异、所述预测动作指示的接口原子和所述训练动作指示的接口原子之间的差异。
- 一种反应物分子的预测装置,其特征在于,包括:提取单元,用于对产物分子进行特征提取,得到所述产物分子的特征;预测单元,用于基于所述产物分子的特征,利用逆向反应预测模型,预测所述产物分子到多个反应物分子之间的转换路径;所述转换路径包括编辑序列和合成子补全序列;编辑单元,用于按照所述编辑序列中的各个编辑动作指示的编辑后的状态,对所述各个编辑动作指示的编辑对象进行编辑,得到所述产物分子对应的多个合成子,所述编辑对象为所述产物分子中的原子或化学键;添加单元,用于针对所述多个合成子中的各个合成子,基于所述合成子补全序列中与所述各个合成子对应的至少一个合成子补全动作,按照所述至少一个合成子补全动作中的各个合成子补全动作指示的接口原子,添加所述各个合成子补全动作指示的基本图,得到与所述多个合成子对应的多个反应物分子,所述基本图包括多个原子或用于连接所述多个原子的原子边。
- 一种逆向反应预测模型的训练装置,其特征在于,包括:提取单元,用于对产物分子进行特征提取,得到所述产物分子的特征;预测单元,用于基于所述产物分子的特征,利用逆向反应预测模型,预测所述产物分子到多个反应物分子之间的转换路径;其中,所述转换路径包括编辑序列和合成子补全序列;所述编辑序列中的各个编辑动作用于指示编辑对象和编辑后的状态,所述编辑对象为所述产物分子中的原子或化学键;针对利用所述编辑序列得到的所述产物分子的多个合成子,所述合成子补全序列包括与所述多个合成子中的各个合成子对应的至少一个合成子补全动作,所述至少一个合成子补全动作中的各个合成子补全动作用于指示基本图和接口原子,所述基本图包括多个原子或用于连接所述多个原子的原子边;训练单元,用于基于所述转换路径和训练路径之间的损失,训练所述逆向反应预测模型。
- 一种电子设备,其特征在于,包括:处理器,适于执行计算机程序;计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,所述计算机程序被所述处理器执行时,实现如权利要求1至7中任一项所述的反应物分子的预测方法或如权利要求8至12中任一项所述的逆向反应预测模型的训练方法。
- 一种计算机可读存储介质,其特征在于,用于存储计算机程序,所述计算机程序使得计算机执行如权利要求1至7中任一项所述的反应物分子的预测方法或如权利要求8至12中任一项所述的逆向反应预测模型的训练方法。
- 一种计算机程序产品,包括计算机程序/指令,其特征在于,所述计算机程序/指令被处理器执行时实现如权利要求1至7中任一项所述的反应物分子的预测方法或如权利要求8至12中任一项所述的逆向反应预测模型的训练方法。
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP23851324.6A EP4394781A4 (en) | 2022-08-09 | 2023-05-26 | REACTANT MOLECULE PREDICTION METHOD AND APPARATUS, TRAINING METHOD AND APPARATUS, AND ELECTRONIC DEVICE |
| JP2024532673A JP2025500754A (ja) | 2022-08-09 | 2023-05-26 | 反応物分子の予測方法、訓練方法、装置、電子機器及びコンピュータプログラム |
| US18/616,867 US20240233877A1 (en) | 2022-08-09 | 2024-03-26 | Method for predicting reactant molecule, training method, apparatus, and electronic device |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202210952642.6A CN115240786B (zh) | 2022-08-09 | 2022-08-09 | 反应物分子的预测方法、训练方法、装置以及电子设备 |
| CN202210952642.6 | 2022-08-09 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US18/616,867 Continuation US20240233877A1 (en) | 2022-08-09 | 2024-03-26 | Method for predicting reactant molecule, training method, apparatus, and electronic device |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024032096A1 true WO2024032096A1 (zh) | 2024-02-15 |
Family
ID=83679443
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2023/096605 Ceased WO2024032096A1 (zh) | 2022-08-09 | 2023-05-26 | 反应物分子的预测方法、训练方法、装置以及电子设备 |
Country Status (5)
| Country | Link |
|---|---|
| US (1) | US20240233877A1 (zh) |
| EP (1) | EP4394781A4 (zh) |
| JP (1) | JP2025500754A (zh) |
| CN (1) | CN115240786B (zh) |
| WO (1) | WO2024032096A1 (zh) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN117877608A (zh) * | 2024-03-13 | 2024-04-12 | 烟台国工智能科技有限公司 | 基于经验网络的蒙特卡洛树搜索逆合成规划方法及装置 |
Families Citing this family (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115240786B (zh) * | 2022-08-09 | 2025-10-24 | 腾讯科技(深圳)有限公司 | 反应物分子的预测方法、训练方法、装置以及电子设备 |
| CN116935969B (zh) * | 2023-07-28 | 2024-03-26 | 宁波甬恒瑶瑶智能科技有限公司 | 基于深度搜索的生物逆合成预测方法、装置和电子设备 |
| WO2025058986A1 (en) * | 2023-09-14 | 2025-03-20 | Deepcure Inc. | Systems and methods for automated reaction development |
| CN117133371B (zh) * | 2023-10-25 | 2024-01-05 | 烟台国工智能科技有限公司 | 基于人工断键的无模板单步逆合成方法及系统 |
| CN117234266B (zh) * | 2023-11-13 | 2024-03-22 | 长沙矿冶研究院有限责任公司 | 一种三元前驱体反应釜反应逆向选择性控制方法及系统 |
| CN117457093B (zh) * | 2023-12-20 | 2024-03-08 | 烟台国工智能科技有限公司 | 基于数据扩增的有机反应产物逆合成方法及装置 |
| CN118782168A (zh) * | 2024-09-10 | 2024-10-15 | 烟台国工智能科技有限公司 | 一种基于多步预测的合成路线排序方法及装置 |
| CN119294096B (zh) * | 2024-10-11 | 2025-11-25 | 烟台先进材料与绿色制造山东省实验室 | 用于生成反应网络的反应力场仿真结果后处理方法及应用 |
| CN120089250B (zh) * | 2025-05-06 | 2025-07-11 | 浙江大学 | 基于多模态大模型的逆合成路线规划方法及系统 |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111524557A (zh) * | 2020-04-24 | 2020-08-11 | 腾讯科技(深圳)有限公司 | 基于人工智能的逆合成预测方法、装置、设备及存储介质 |
| US20210225462A1 (en) * | 2020-01-16 | 2021-07-22 | Emd Millipore Corporation | Method Of Synthesizing Chemical Compounds |
| KR20210147862A (ko) * | 2020-05-29 | 2021-12-07 | 삼성전자주식회사 | 역합성 예측 모델의 학습 방법 및 장치 |
| CN113782109A (zh) * | 2021-09-13 | 2021-12-10 | 烟台国工智能科技有限公司 | 基于蒙特卡洛树的反应物推导方法及逆向合成推导方法 |
| CN113990405A (zh) * | 2021-10-19 | 2022-01-28 | 上海药明康德新药开发有限公司 | 试剂化合物预测模型的构建方法、化学反应试剂自动预测补全的方法与装置 |
| KR20220022059A (ko) * | 2020-07-29 | 2022-02-24 | 주식회사 아론티어 | 역합성 예측을 위한 하위구조 기반의 신경망 기계 번역 장치 및 이를 이용한 번역 방법 |
| CN115240786A (zh) * | 2022-08-09 | 2022-10-25 | 腾讯科技(深圳)有限公司 | 反应物分子的预测方法、训练方法、装置以及电子设备 |
Family Cites Families (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP4978764B2 (ja) * | 2005-03-22 | 2012-07-18 | 独立行政法人産業技術総合研究所 | 反応熱の推算方法及び装置 |
| GB201605110D0 (en) * | 2016-03-24 | 2016-05-11 | Mologic Ltd | Detecting sepsis |
| AU2019217331B2 (en) * | 2018-01-30 | 2024-11-07 | Sri International | Computational generation of chemical synthesis routes and methods |
| CN113140260B (zh) * | 2020-01-20 | 2023-09-08 | 腾讯科技(深圳)有限公司 | 合成物的反应物分子组成数据预测方法和装置 |
| KR20230042048A (ko) * | 2020-07-17 | 2023-03-27 | 제넨테크, 인크. | 펩타이드 결합, 제시, 및 면역원성을 예측하기 위한 어텐션-기반 신경망 |
| CN114822703B (zh) * | 2021-01-27 | 2025-08-22 | 腾讯科技(深圳)有限公司 | 一种化合物分子的逆合成预测方法以及相关装置 |
| CN114496105B (zh) * | 2022-01-24 | 2024-08-23 | 武汉大学 | 一种基于多语义网络的单步逆合成方法及系统 |
-
2022
- 2022-08-09 CN CN202210952642.6A patent/CN115240786B/zh active Active
-
2023
- 2023-05-26 EP EP23851324.6A patent/EP4394781A4/en active Pending
- 2023-05-26 WO PCT/CN2023/096605 patent/WO2024032096A1/zh not_active Ceased
- 2023-05-26 JP JP2024532673A patent/JP2025500754A/ja active Pending
-
2024
- 2024-03-26 US US18/616,867 patent/US20240233877A1/en active Pending
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20210225462A1 (en) * | 2020-01-16 | 2021-07-22 | Emd Millipore Corporation | Method Of Synthesizing Chemical Compounds |
| CN111524557A (zh) * | 2020-04-24 | 2020-08-11 | 腾讯科技(深圳)有限公司 | 基于人工智能的逆合成预测方法、装置、设备及存储介质 |
| KR20210147862A (ko) * | 2020-05-29 | 2021-12-07 | 삼성전자주식회사 | 역합성 예측 모델의 학습 방법 및 장치 |
| KR20220022059A (ko) * | 2020-07-29 | 2022-02-24 | 주식회사 아론티어 | 역합성 예측을 위한 하위구조 기반의 신경망 기계 번역 장치 및 이를 이용한 번역 방법 |
| CN113782109A (zh) * | 2021-09-13 | 2021-12-10 | 烟台国工智能科技有限公司 | 基于蒙特卡洛树的反应物推导方法及逆向合成推导方法 |
| CN113990405A (zh) * | 2021-10-19 | 2022-01-28 | 上海药明康德新药开发有限公司 | 试剂化合物预测模型的构建方法、化学反应试剂自动预测补全的方法与装置 |
| CN115240786A (zh) * | 2022-08-09 | 2022-10-25 | 腾讯科技(深圳)有限公司 | 反应物分子的预测方法、训练方法、装置以及电子设备 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4394781A4 |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN117877608A (zh) * | 2024-03-13 | 2024-04-12 | 烟台国工智能科技有限公司 | 基于经验网络的蒙特卡洛树搜索逆合成规划方法及装置 |
| CN117877608B (zh) * | 2024-03-13 | 2024-05-28 | 烟台国工智能科技有限公司 | 基于经验网络的蒙特卡洛树搜索逆合成规划方法及装置 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN115240786A (zh) | 2022-10-25 |
| CN115240786B (zh) | 2025-10-24 |
| EP4394781A4 (en) | 2025-03-19 |
| US20240233877A1 (en) | 2024-07-11 |
| JP2025500754A (ja) | 2025-01-15 |
| EP4394781A1 (en) | 2024-07-03 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2024032096A1 (zh) | 反应物分子的预测方法、训练方法、装置以及电子设备 | |
| CN113688878B (zh) | 一种基于记忆力机制和图神经网络的小样本图像分类方法 | |
| US20240331235A1 (en) | User interface for generating and manipulating molecular images with natural language instructions | |
| CN111488734A (zh) | 基于全局交互和句法依赖的情感特征表示学习系统及方法 | |
| CN118780767A (zh) | 融合自然语言处理的项目评价评审方法及系统 | |
| CN112966127A (zh) | 一种基于多层语义对齐的跨模态检索方法 | |
| CN115357728B (zh) | 基于Transformer的大模型知识图谱表示方法 | |
| CN114999565B (zh) | 一种基于表示学习和图神经网络的药物靶标亲和力预测方法 | |
| CN114564596A (zh) | 一种基于图注意力机制的跨语言知识图谱链接预测方法 | |
| EP4586144A1 (en) | Data processing method and related apparatus | |
| CN116108363B (zh) | 基于标签引导的不完备多视图多标签分类方法和系统 | |
| CN113780002A (zh) | 基于图表示学习和深度强化学习的知识推理方法及装置 | |
| CN115438197B (zh) | 一种基于双层异质图的事理知识图谱关系补全方法及系统 | |
| CN115438160A (zh) | 一种基于深度学习的问答方法、装置及电子设备 | |
| WO2023174064A1 (zh) | 自动搜索方法、自动搜索的性能预测模型训练方法及装置 | |
| Zhang et al. | A subgraph sampling method for training large-scale graph convolutional network | |
| CN113343100B (zh) | 一种基于知识图谱的智慧城市资源推荐方法和系统 | |
| Chen et al. | Semantic-aware network embedding via optimized random walk and paragaraph2vec | |
| CN115880018A (zh) | 用户个性化商品推荐方法及装置 | |
| Xue et al. | Fast and unsupervised neural architecture evolution for visual representation learning | |
| CN114722212A (zh) | 一种面向人物关系网络的自动元路径挖掘方法 | |
| Ma et al. | Consistency knowledge distillation based on similarity attribute graph guidance | |
| CN119249202A (zh) | 一种基于大语言模型的通用图任务方法 | |
| Xu et al. | GRMI: Graph representation learning of multimodal data with incompleteness | |
| CN117391142A (zh) | 图神经网络模型设计方法和系统 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| WWE | Wipo information: entry into national phase |
Ref document number: 2023851324 Country of ref document: EP Ref document number: 23 851 324.6 Country of ref document: EP |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23851324 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2023851324 Country of ref document: EP Effective date: 20240326 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2024532673 Country of ref document: JP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |