WO2021082753A1 - 蛋白质的结构信息预测方法、装置、设备及存储介质 - Google Patents
蛋白质的结构信息预测方法、装置、设备及存储介质 Download PDFInfo
- Publication number
- WO2021082753A1 WO2021082753A1 PCT/CN2020/114386 CN2020114386W WO2021082753A1 WO 2021082753 A1 WO2021082753 A1 WO 2021082753A1 CN 2020114386 W CN2020114386 W CN 2020114386W WO 2021082753 A1 WO2021082753 A1 WO 2021082753A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sequence
- sequence feature
- amplified
- protein
- database
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional [2D] or three-dimensional [3D] molecular structures, e.g. structural or functional relations or structure alignment
- G16B15/20—Protein or domain folding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional [2D] or three-dimensional [3D] molecular structures, e.g. structural or functional relations or structure alignment
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/30—Detection of binding sites or motifs
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B5/00—ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
- G16B50/30—Data warehousing; Computing architectures
Definitions
- This application relates to the field of biological information technology, in particular to a method, device, equipment and storage medium for predicting protein structure information.
- the protein’s amino acid sequence can be used to determine the protein’s structural information.
- a multi-sequence alignment data query operation is performed in an amino acid sequence database to extract the sequence characteristics of the amino acid sequence of the protein, and then according to the amino acid sequence of the protein.
- Sequence features predict the structural information of the protein. Among them, the accuracy of the above sequence feature extraction is directly related to the data scale of the database. The larger the data scale of the amino acid sequence database, the higher the accuracy of the sequence feature extraction.
- the embodiments of the present application provide a method, device, equipment, and storage medium for predicting protein information structure, which can improve the prediction efficiency of protein structure information while ensuring the prediction accuracy of protein structure information.
- the technical solutions are as follows:
- a method for predicting the information structure of a protein comprising:
- the initial sequence feature is processed by the sequence feature amplification model to obtain the amplified sequence feature of the protein;
- the sequence feature amplification model is a machine learning model obtained by training the initial sequence feature sample and the amplified sequence feature sample
- the initial sequence feature sample is obtained by performing a sequence alignment query in the first database based on an amino acid sequence sample, and the amplified sequence feature sample is obtained by performing a sequence alignment query in a second database based on the amino acid sequence sample ⁇ ;
- the data size of the second database is greater than the data size of the first database;
- the structural information of the protein is predicted based on the characteristics of the amplified sequence.
- a protein structure information prediction device includes:
- the data acquisition module is used to perform sequence alignment query in the first database according to the amino acid sequence of the protein to obtain multiple sequence alignment data;
- An initial feature acquisition module configured to perform feature extraction on the multi-sequence alignment data to obtain initial sequence features
- the amplification feature acquisition module is used to process the initial sequence feature through the sequence feature amplification model to obtain the amplified sequence feature of the protein;
- the sequence feature amplification model uses the initial sequence feature sample and the amplified sequence A machine learning model obtained by feature sample training;
- the initial sequence feature sample is obtained by performing a sequence alignment query in the first database based on the amino acid sequence sample, and the amplified sequence feature sample is based on the amino acid sequence sample in the first database. Obtained by performing sequence alignment query in the second database; the data size of the second database is larger than the data size of the first database;
- the structure information prediction module is used to predict the structure information of the protein based on the characteristics of the amplified sequence.
- the data distribution similarity between the first database and the second database is higher than a similarity threshold.
- the first database is a database obtained after randomly removing a specified proportion of data on the basis of the second database.
- the sequence feature amplification model is a fully convolutional neural network for one-dimensional sequence data, a recurrent neural network composed of multiple layers of Long Short-Term Memory (LSTM) units Model or recurrent neural network composed of bidirectional LSTM units.
- LSTM Long Short-Term Memory
- the initial sequence feature and the amplified sequence feature are a position-specific scoring matrix.
- the device further includes:
- An amplified sample acquisition module configured to process the initial sequence feature sample through the sequence feature amplification model to obtain an amplified initial sequence feature sample
- the model update module is used to update the sequence feature amplification model according to the amplified initial sequence feature sample and the amplified sequence feature sample.
- the model update module includes:
- a loss function acquisition sub-module configured to perform loss function calculation according to the amplified initial sequence feature sample and the amplified sequence feature sample to obtain a loss function value
- the parameter update sub-module is used to update the model parameters in the sequence feature amplification model according to the loss function value.
- the loss function acquisition sub-module includes:
- An error calculation unit configured to calculate the reconstruction error between the amplified initial sequence feature sample and the amplified sequence feature sample
- the loss function acquiring unit is configured to acquire the reconstruction error as the loss function value.
- the error calculation unit calculates a root mean square reconstruction error between the amplified initial sequence feature sample and the amplified sequence feature sample.
- model update module is used to:
- the model parameters in the sequence feature amplification model are updated according to the loss function value.
- the structure information prediction module includes:
- the structure information acquisition sub-module is used to predict the characteristics of the amplified sequence through a protein structure information prediction model to obtain the structure information of the protein;
- the protein structure information prediction model is a model obtained by training based on the sequence characteristics of the protein sample and the structure information of the protein sample.
- a computer device in one aspect, includes a processor and a memory.
- the memory stores at least one instruction, at least one program, code set or instruction set, at least one instruction, at least one program, code set or instruction set. Loaded and executed by the processor to realize the above-mentioned protein structure information prediction method.
- a computer-readable storage medium stores at least one instruction, at least one program, code set or instruction set, and the at least one instruction, at least one program, code set or instruction set is loaded by a processor And execute to realize the above-mentioned protein structure information prediction method.
- a computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
- the processor of the computer device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the protein structure information prediction method provided in the various alternative implementations of the foregoing aspects.
- the sequence alignment query is performed on the amino acid sequence of the protein
- the feature extraction is performed on the multi-sequence alignment data
- the amplified sequence feature of the protein is obtained through a sequence feature amplification model, and then the protein is predicted
- the sequence feature amplification model it is only necessary to perform sequence alignment query in the first database with a smaller data size, which can obtain higher prediction accuracy.
- the first database with a smaller data size It takes less time to perform sequence alignment queries. Therefore, the above solution can improve the prediction efficiency of protein structure information while ensuring the prediction accuracy of protein structure information.
- FIG. 1 is a model training and protein structure information prediction framework diagram provided by an exemplary embodiment of the present application
- Fig. 2 is a model architecture diagram of a machine learning model provided by an exemplary embodiment of the present application
- Fig. 3 is a schematic flow chart of a method for predicting structure information of a protein provided by an exemplary embodiment of the present application
- FIG. 4 is a schematic flowchart of a machine learning model training and protein structure information prediction method provided by an exemplary embodiment of the present application
- FIG. 5 is a schematic diagram of training a sequence feature automatic amplification model involved in the embodiment shown in FIG. 4;
- FIG. 6 is a schematic diagram of protein structure information prediction related to the embodiment shown in FIG. 4;
- Fig. 7 is a block diagram showing the structure of an apparatus for predicting protein structure information according to an exemplary embodiment
- Fig. 8 is a schematic structural diagram of a computer device according to an exemplary embodiment
- Fig. 9 is a schematic structural diagram of a terminal according to an exemplary embodiment.
- the present application provides a protein structure information prediction method, which can recognize the structure information of the protein through artificial intelligence (AI), thereby providing an efficient and high-accuracy protein structure information prediction scheme.
- AI artificial intelligence
- Amino acid is a compound in which the hydrogen atom on the carbon atom of a carboxylic acid is replaced by an amino group.
- the amino acid molecule contains two functional groups: amino group and carboxyl group. Similar to hydroxy acids, amino acids can be divided into ⁇ -, ⁇ -, ⁇ -...w-amino acids according to the different positions of the amino groups attached to the carbon chain, but the amino acids obtained after protein hydrolysis are all ⁇ -amino acids, and There are only two dozen types, and they are the basic unit of protein.
- the 20 amino acids refer to glycine, alanine, valine, leucine, isoleucine, phenylalanine, proline, tryptophan, serine, tyrosine, cysteine, methionine, Asparagine, glutamine, threonine, aspartic acid, glutamic acid, lysine, arginine and histidine are 20 amino acids that make up human protein.
- Compounds containing multiple peptide bonds formed by dehydration and condensation of these 20 amino acid molecules are called polypeptides.
- Polypeptides are usually chain-like structures called peptide chains. The peptide chain can be twisted and folded to form a protein molecule with a certain spatial structure.
- Protein structure refers to the spatial structure of protein molecules. Protein is mainly composed of carbon, hydrogen, oxygen, nitrogen and other chemical elements. It is an important biological macromolecule. All proteins are polymers formed by connecting 20 different amino acids. After forming proteins, these amino acids are also called Is the residue.
- the molecular structure of a protein can be divided into four levels to describe its different aspects:
- Primary structure The linear amino acid sequence that composes the polypeptide chain of a protein.
- Tertiary structure The three-dimensional structure of a protein molecule formed by the arrangement of multiple secondary structure elements in a three-dimensional space.
- Quaternary structure used to describe the interaction of different polypeptide chains (subunits) to form functional protein complex molecules.
- Artificial intelligence is a theory, method, technology and application system that uses digital computers or digital computer-controlled machines to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results.
- artificial intelligence is a comprehensive technology of computer science, which attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence.
- Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
- Artificial intelligence technology is a comprehensive discipline, covering a wide range of fields, including both hardware-level technology and software-level technology.
- Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, and mechatronics.
- Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
- Machine learning is a multi-field interdisciplinary subject, involving many subjects such as probability theory, statistics, approximation theory, convex analysis, and algorithm complexity theory. Specializing in the study of how computers simulate or realize human learning behaviors in order to acquire new knowledge or skills, and reorganize the existing knowledge structure to continuously improve its own performance.
- Machine learning is the core of artificial intelligence and the fundamental way to make computers intelligent. Its applications are in all fields of artificial intelligence.
- Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, teaching learning and other technologies.
- Fig. 1 is a framework diagram showing a model training and protein structure information prediction according to an exemplary embodiment.
- the model training device 110 trains a machine learning model by performing multiple sequence alignment data query operations and sequence feature extraction operations on the amino acid sequence corresponding to the same protein on databases of different sizes.
- the prediction device 120 can predict the structural information of the protein corresponding to the amino acid sequence based on the trained machine learning model and the input amino acid sequence.
- the aforementioned model training device 110 and prediction device 120 may be computer equipment with machine learning capabilities.
- the computer equipment may be stationary computer equipment such as personal computers, servers, and stationary scientific research equipment, or the computer equipment may also be It is a mobile computer device such as a tablet computer and an e-book reader.
- the aforementioned model training device 110 and the prediction device 120 are the same device, or the model training device 110 and the prediction device 120 are different devices.
- the model training device 110 and the prediction device 120 may be the same type of device.
- the model training device 110 and the prediction device 120 may both be personal computers; or, the model The training device 110 and the prediction device 120 may also be different types of devices.
- the model training device 110 may be a server, and the prediction device 120 may be a stationary scientific research experimental device. The embodiment of the present application does not limit the specific types of the model training device 110 and the prediction device 120.
- Fig. 2 is a model architecture diagram of a machine learning model according to an exemplary embodiment.
- the machine learning model 20 in the embodiment of the present application may include two models, where the sequence feature amplification model 210 is used to automatically amplify the input sequence features, and output the amplified sequence features.
- the sequence feature amplification model 210 also inputs the amplified sequence features to the protein structure information prediction model 220.
- the protein structure information prediction model 220 is used to amplify according to the sequence features.
- the amplified sequence feature input by the model 210 performs protein structure information prediction, and outputs the prediction result of the protein structure information.
- the protein structure information prediction does not only use the feature sequence extracted from a single database through multi-sequence alignment data query as the input data in the protein structure information prediction model.
- the amplified sequence features are used as the input data for predicting protein structure information. Compared with the sequence features obtained from a single database comparison, the automatically amplified sequence features are more accurate in predicting protein structure information.
- Proteins have important practical roles in organisms. For example, proteins can cause certain genetic diseases, or proteins can make organisms immune to specific diseases.
- the role of a protein in an organism is largely determined by its three-dimensional structure, while the three-dimensional structure of a protein is essentially determined by its corresponding amino acid sequence information.
- the three-dimensional structure of the protein can be determined by experimental methods, for example, the three-dimensional structure of the protein can be determined by methods such as X-ray crystallization, nuclear magnetic resonance, and cryo-electron microscopy. Due to the high time and economic cost of determining the three-dimensional structure of a protein based on experimental methods, it is of extremely high scientific significance and practical value to directly predict the three-dimensional structure of a protein based on the corresponding amino acid sequence of the protein through computational methods rather than experimental methods. .
- part of the structure information of the protein determines the accuracy of the final three-dimensional structure of the protein.
- part of the protein structure information includes main chain dihedral angle or secondary structure, etc. Therefore, in view of the contradiction between the prediction accuracy and calculation efficiency in the protein structure information prediction algorithm based on sequence features, the protein structure information proposed in this application
- the prediction method can reduce the data scale requirements of the amino acid sequence database, and obtain protein structure information prediction accuracy similar to traditional methods with lower database storage and query costs, improve the prediction accuracy and calculation efficiency of protein structure information, and promote protein The prediction accuracy of three-dimensional structure is improved.
- FIG. 3 shows a schematic flow chart of a method for predicting structure information of a protein provided by an exemplary embodiment of the present application.
- the protein structure information prediction method can be executed by a computer device, such as the prediction device 120 shown in FIG. 1 described above.
- the protein structure information prediction method may include the following steps:
- Step 310 Perform a sequence alignment query in the first database according to the amino acid sequence of the protein to obtain multiple sequence alignment data.
- the computer device can obtain the multi-sequence alignment data through the sequence alignment operation.
- sequence alignment refers to aligning multiple amino acid sequences and highlighting similar structural regions among them.
- the first database is a database containing several amino acid sequences.
- Step 320 Perform feature extraction on the multi-sequence alignment data to obtain initial sequence features.
- the prediction device can obtain each amino acid sequence through the iterative basic local alignment search tool (Position-Specific Iterative Basic Local Alignment Search Tool, PSI-BLAST) in the first database after multiple sequence alignments in the first database.
- PSI-BLAST Position-Specific Iterative Basic Local Alignment Search Tool
- the homologous sequences obtained by the data query operation are then compared with the homology information of each sequence to obtain a Position-Specific Scoring Matrices (PSSM), which can be used as the aforementioned sequence feature.
- PSSM Position-Specific Scoring Matrices
- the position-specific scoring matrix can be expressed as the frequency value of an amino acid at the corresponding position obtained after multi-sequence alignment of the amino acid sequence, or the frequency of each amino acid displayed at each corresponding position, or each The probability of each amino acid is displayed at the corresponding position.
- step 330 the initial sequence feature is processed by the sequence feature amplification model to obtain the amplified sequence feature of the protein.
- the prediction device may input the above-mentioned initial sequence feature to the sequence feature amplification model, and the sequence feature amplification model performs feature amplification on the initial sequence feature, that is, adding new features to the initial sequence feature to obtain A more comprehensive amplified sequence feature.
- the sequence feature amplification model is a machine learning model obtained by training the initial sequence feature sample and the amplified sequence feature sample;
- the initial sequence feature sample is obtained by performing sequence alignment query in the first database based on the amino acid sequence sample, and the amplified sequence
- the characteristic samples are obtained by performing sequence alignment queries in the second database according to the amino acid sequence samples;
- the data size of the second database is larger than the data size of the first database.
- the computer device can use the initial sequence feature sample as the input of the sequence feature amplification model, and use the amplified sequence feature sample as the initial sequence feature sample. Annotate the data and train the sequence feature amplification model.
- the sequence feature amplification model may be a fully convolutional neural network model (Fully Convolutional Networks for Semantic Segmentation, FCN) for one-dimensional sequence data.
- FCN Fully Convolutional neural network model
- CNN convolutional Neural Network
- the sequence feature amplification model is a cyclic neural network model composed of multiple layers of long and short-term memory LSTM units, or a cyclic neural network model composed of bidirectional LSTM units.
- a recurrent neural network (Recurrent Neural Network, RNN) is a type of recurrent neural network that takes sequence data as input, recursively in the evolution direction of the sequence, and all nodes, that is, recurrent units are connected in a chain.
- Step 340 Predict the structural information of the protein based on the amplified sequence characteristics.
- the prediction device predicts the structural information of the protein, which may include but is not limited to predicting the dihedral angle of the main chain of the protein and/or the secondary structure information of the protein.
- the dihedral angle is between two adjacent amide planes, which can rotate with the common Ca as a fixed point, and the angle of rotation around the Ca-N bond is called Angle, the angle of rotation around the C-Ca bond is called the ⁇ angle. among them, The angle and the ⁇ angle are called dihedral angles.
- the angle and the ⁇ angle are called dihedral angles.
- the main chain of the peptide chain can be regarded as composed of many planes separated by Ca.
- the dihedral angle determines the relative position of the two peptide planes, that is, determines the position and conformation of the main chain of the peptide chain.
- Protein secondary structure refers to the specific conformation formed when the backbone atoms of the polypeptide backbone spiral or fold along a certain axis, that is, the spatial position of the backbone atoms of the peptide chain, and does not involve the side chains of amino acid residues.
- the main forms of protein secondary structure include ⁇ -helix, ⁇ -sheet, ⁇ -turn and random coil. Due to the large molecular weight of proteins, different peptides of a protein molecule can contain different forms of secondary structure. In proteins, the main force for maintaining secondary structure is hydrogen bonding.
- the secondary structure of a protein is not a simple ⁇ -helix or ⁇ -sheet structure, but also includes a combination of these different types of conformations. In different proteins, the proportions of different types of conformations may vary.
- the sequence alignment query is performed on the amino acid sequence of the protein
- the feature extraction is performed on the multi-sequence alignment data
- the amplified sequence of the protein is obtained through a sequence feature amplification model. Feature, and then predict the structural information of the protein.
- sequence feature amplification model it is only necessary to perform sequence alignment query in the first database with a smaller data scale, that is, a higher prediction accuracy can be obtained.
- the small first database consumes less time for sequence alignment query. Therefore, the above solution can improve the prediction efficiency of protein structure information while ensuring the prediction accuracy of protein structure information.
- FIG. 4 shows a schematic flow chart of a machine learning model training and protein structure information prediction method provided by an exemplary embodiment of the present application.
- the program is divided into two parts: machine learning model training and protein structure information prediction.
- the machine learning model training and protein structure information prediction methods can be executed by computer equipment, where the computer equipment may include the training equipment shown in Figure 1 above. 110 and the prediction device 120.
- the machine learning model training and protein structure information prediction method may include the following steps:
- Step 401 The training device performs a sequence alignment query in the first database according to the amino acid sequence sample, and obtains an initial sequence feature sample according to the query result.
- the training device may perform sequence alignment query in the first database according to the amino acid sequence samples to obtain multi-sequence alignment data, and then perform feature extraction on the multi-sequence alignment data to obtain the aforementioned initial sequence feature samples.
- the amino acid sequence of a certain protein can be composed of multiple amino acids (for example, 20 kinds of basic amino acids are known).
- the above-mentioned amino acid sequence sample may be a currently known amino acid sequence of a protein, or the above-mentioned amino acid sequence sample may also be an amino acid sequence generated randomly or according to a certain rule.
- the aforementioned amino acid sequence sample includes an amino acid sequence with known protein structure information, or an amino acid sequence with unknown protein structure information, or, at the same time, an amino acid sequence with known protein structure information and an unknown protein structure information.
- the amino acid sequence includes an amino acid sequence with known protein structure information, or an amino acid sequence with unknown protein structure information, or, at the same time, an amino acid sequence with known protein structure information and an unknown protein structure information.
- Step 402 The training device performs sequence alignment query in the second database according to the amino acid sequence samples, and obtains the amplified sequence feature samples according to the query results.
- the training device may perform sequence alignment query in the second database according to the amino acid sequence samples to obtain multi-sequence alignment data, and then perform feature extraction on the multi-sequence alignment data to obtain the aforementioned amplified sequence feature samples.
- the training device obtains the initial sequence feature sample and the amplified sequence feature sample from the first database and the second database through the same amino acid sequence sample, and the initial sequence feature sample and the amplified sequence feature sample have a one-to-one correspondence.
- the aforementioned initial sequence feature samples and amplified sequence feature samples may be sequence features extracted according to the same feature extraction algorithm.
- the aforementioned initial sequence feature samples and amplified sequence feature samples may both be position-specific scoring matrices, and The elements in the matrix are of the same type.
- the data scale of the aforementioned second database is larger than the data scale of the first database.
- the first database and the second database are respectively amino acid sequence databases, each database contains several amino acid sequences, and the number of amino acid sequences contained in the second database is greater than that contained in the first database. The number of amino acid sequences.
- the aforementioned similarity of data distribution between the first database and the second database is higher than the similarity threshold.
- the above-mentioned first database and second database may use databases with similar data distributions, that is, between the first database and the second database
- the similarity between the data distributions needs to be higher than a predetermined similarity threshold.
- the aforementioned similarity threshold may be a value preset by the developer.
- the first database and the second database are the same kind of database, but have different data sizes.
- the foregoing database may be two existing databases with similar data distribution.
- the foregoing first database and second database may be UniRef databases with different data sizes; or, the foregoing first database and second database may be Swiss-Prot database and TrEMBL database in UniProtKB database.
- the UniRef database can be divided into three levels according to the identity: 100%, 90% and 50%, respectively UniRef100, UniRef90 and UniRef50 databases, UniRef100, UniRef90 and UniRef50, the data volume of these three databases are in the complete database. Decrease by 10%, 40% and 70% on the basis of
- the aforementioned first database may be the UniRef50 database
- the second database may be the UniRef90 or UniRef100 database (the data size of the UniRef50 database is smaller than the data size of the UniRef90 or UniRef100 database).
- the first database may be the UniRef90 database
- the second database may be the UniRef100 database.
- the above-mentioned first database is a database obtained after randomly removing a specified proportion of data on the basis of the second database.
- the aforementioned specified ratio may be a ratio preset by the developer.
- the training device may randomly remove a specified proportion (for example, 50%) of amino acid sequences on the basis of the second database to obtain the first database.
- the aforementioned second database may be an existing database.
- the second database may be the aforementioned UniRef90 database (or other existing databases), and the training device will remove general amino acid sequences from the UniRef90 database immediately to obtain the aforementioned first database.
- Step 403 The training device processes the initial sequence feature sample through the sequence feature amplification model to obtain the amplified initial sequence feature sample.
- the computer device processes the initial sequence feature sample through the sequence feature amplification model to obtain the amplified initial sequence feature sample, which is the same as the process of obtaining the amplified sequence feature in the embodiment shown in FIG. 3 above. Similar, not repeat them here.
- the sequence feature amplification model in this step may be a model that has not been trained yet.
- step 404 the training device updates the sequence feature amplification model according to the amplified initial sequence feature sample and the amplified sequence feature sample.
- the training device performs a loss function calculation according to the amplified initial sequence feature sample and the amplified sequence feature sample to obtain the loss function value. Then, the training device updates the model parameters in the sequence feature amplification model according to the loss function value.
- the training device calculates the reconstruction error between the amplified initial sequence feature sample and the amplified sequence feature sample, and obtains the reconstruction error as the loss function value.
- the above reconstruction error is the root mean square reconstruction error, that is, when obtaining the reconstruction error, the training device calculates the difference between the amplified initial sequence feature sample and the amplified sequence feature sample
- the root-mean-square reconstruction error between the two, and the root-mean-square reconstruction error is obtained as the loss function value.
- both x and z are of size L ⁇ D matrix.
- the reconstruction error between the original sequence feature sample after automatic amplification and the reference sequence feature can be obtained by the root mean square reconstruction error calculation method, and the calculation formula is:
- x ij and z ij are the elements in the i-th row and j-th column of the matrix x and the matrix z, respectively.
- FIG. 5 shows a schematic diagram of training of a sequence feature amplification model involved in an embodiment of the present application.
- the training process of the sequence feature amplification model is as follows:
- the training device obtains an amino acid sequence sample, and performs a multi-sequence alignment data query operation of the amino acid sequence sample on the UniRef50 database to obtain a multi-sequence alignment data result.
- the training device performs feature extraction on the result of the multi-sequence alignment data of S51 to obtain sequence features before automatic amplification, which may also be referred to as an initial sequence feature sample.
- the training device performs the multi-sequence alignment data query operation of the amino acid sequence on the UniRef90 database to obtain the multi-sequence alignment data result.
- the training device performs feature extraction on the result of the multi-sequence alignment data of S53 to obtain a reference sequence feature, which may also be referred to as an amplified sequence feature sample.
- the training device inputs the initial sequence feature sample into the sequence feature amplification model.
- the sequence feature amplification model outputs the amplified sequence features, which can be referred to as the initial sequence feature sample after amplification.
- the training device calculates the reconstruction error between the amplified sequence feature and the reference sequence feature as a loss function according to the formula, and trains and updates the sequence feature amplification model according to the loss function.
- the training device updates the model parameters in the sequence feature amplification model according to the loss function value.
- the training device can judge whether the model has converged according to the value of the loss function. If the sequence feature amplification model has converged, the training device can end the training and output the sequence feature amplification model to the prediction device. Predict the structural information of the protein.
- the training device may update the model parameters in the sequence feature amplification model according to the loss function value.
- the training device compares the above-mentioned loss function value with a preset loss function threshold. If the loss function value is less than the loss function threshold, it indicates that the sequence feature amplification model output The result of is already close to the result obtained from the query in the second database, indicating that the sequence feature amplification model can achieve better feature amplification effect, and the model has been determined to have converged; on the contrary, if the loss function value is not less than the loss function threshold, It shows that the output result of the sequence feature amplification model is far from the result obtained from the query in the second database, indicating that the sequence feature amplification model has not yet achieved a better feature amplification effect, and it is judged that the model has not converged at this time.
- the training device compares the above-mentioned loss function value with the loss function value obtained in the previous round of update process. If the loss function value obtained this time is compared with the previous one The difference between the loss function values obtained in the round is less than the difference threshold, which means that the accuracy of the sequence feature amplification model is small, and the training can not achieve a significant improvement. At this time, the judgment model has converged; on the contrary, if The difference between the loss function value obtained this time and the loss function value obtained in the previous round is not less than the difference threshold, which indicates that the accuracy of the sequence feature amplification model has been greatly improved, and further training may have a significant improvement. At this time, it is determined that the model has not converged.
- the training device compares the above-mentioned loss function value with the loss function value obtained in the previous round of update process, and at the same time compares the loss function value obtained this time with the loss function value.
- the function threshold is compared. If the loss function value is less than the loss function threshold, and the difference between the loss function value obtained this time and the loss function value obtained in the previous round is less than the difference threshold, the model is determined to have converged.
- the prediction device can predict the structure information of the protein whose structure is unknown according to the sequence feature amplification model and the above-mentioned first database.
- the prediction process can refer to the subsequent steps.
- Step 405 The prediction device performs a sequence alignment query in the first database according to the amino acid sequence of the protein to obtain multiple sequence alignment data.
- the protein in this step may be a protein that requires structural information prediction.
- Step 406 The prediction device performs feature extraction on the multi-sequence alignment data to obtain initial sequence features.
- Step 407 The prediction device processes the initial sequence feature through the sequence feature amplification model to obtain the amplified sequence feature of the protein.
- step 405 to step 407 For the process from step 405 to step 407, reference may be made to the description in the embodiment shown in FIG. 3, which will not be repeated here.
- Step 408 Predict the structural information of the protein based on the amplified sequence characteristics.
- the prediction device can predict the amplified sequence feature through the protein structure information prediction model to obtain the protein structure information of the protein; wherein, the protein structure information prediction model is based on the sequence feature of the protein sample, and The model obtained by training the structural information of the protein sample.
- the aforementioned protein structure information prediction model is an existing one, which is a machine learning model trained by other computer equipment.
- the protein structure information prediction model used to predict the structure information of the protein may also be a model obtained through machine learning.
- the training device can obtain several protein samples with known structural information and the amino acid sequence of each protein sample; then, the training device performs sequence alignment query in the third database according to the amino acid sequence of the protein sample to obtain multiple sequence alignment data , And perform feature extraction on the multi-sequence alignment data obtained by the query to obtain the sequence feature of the protein sample; then take the sequence feature of the protein sample as input and the structure information of the protein sample as the annotation information to train the above-mentioned protein structure information prediction model . After the protein structure information prediction model is trained, it can be applied to this step. The prediction device predicts the structure information of the protein according to the amplified sequence characteristics of the protein to be predicted and the protein structure information prediction model.
- the above-mentioned second database in order to improve the accuracy of predicting the structure information of the protein according to the amplified sequence characteristics of the protein to be predicted and the protein structure information prediction model, the above-mentioned second database can be used as the protein structure information prediction model training process
- the database used in (ie, the third database), that is, the above-mentioned second database and the third database may be the same database.
- the above-mentioned second database and the third database are different data.
- the third database may be a database with a larger data scale than the second database, and the second database and the third database The similarity of the data distribution is higher than the similarity threshold.
- the second database may be the UniRef90 database
- the third database may be the UniRef100 database.
- FIG. 6 shows a schematic diagram of protein structure information prediction involved in an embodiment of the present application.
- the process of protein structure information prediction is as follows:
- the prediction device obtains an amino acid sequence, and performs a multi-sequence alignment data query operation of the amino acid sequence on the UniRef50 database to obtain a multi-sequence alignment data result.
- the prediction device performs feature extraction on the result of the multi-sequence alignment data to obtain the sequence feature before automatic amplification.
- the prediction device inputs the sequence feature before automatic amplification into the trained sequence feature amplification model.
- the sequence feature amplification model outputs the automatically amplified sequence features.
- the prediction device inputs the automatically amplified sequence features into the protein structure information prediction model.
- the protein structure information prediction model outputs the protein structure information prediction result corresponding to the amino acid sequence.
- the training device and the prediction device may be the same computer device, that is, the computer device first trains to obtain the sequence feature amplification model, and then performs protein structure information according to the sequence feature amplification model prediction.
- the training device and the prediction device may be different computer devices, that is, the training device first trains to obtain the sequence feature amplification model, provides the sequence feature amplification model to the prediction device, and the prediction device amplifies the sequence feature according to the sequence feature.
- the model predicts the structural information of the protein.
- the sequence alignment query is performed on the amino acid sequence of the protein
- the feature extraction is performed on the multi-sequence alignment data
- the amplified sequence of the protein is obtained through a sequence feature amplification model. Feature, and then predict the structural information of the protein.
- sequence feature amplification model it is only necessary to perform sequence alignment query in the first database with a smaller data scale, that is, a higher prediction accuracy can be obtained.
- the small first database consumes less time for sequence alignment query. Therefore, the above solution can improve the prediction efficiency of protein structure information while ensuring the prediction accuracy of protein structure information.
- Fig. 7 is a block diagram showing the structure of an apparatus for predicting protein structure information according to an exemplary embodiment.
- the protein structure information prediction device can be implemented as all or part of a computer device in a hardware or a combination of software and hardware, so as to perform all or part of the steps of the method shown in the corresponding embodiment of FIG. 3 or FIG. 4.
- the protein structure information prediction device may include:
- the data acquisition module 710 is configured to perform sequence alignment query in the first database according to the amino acid sequence of the protein to obtain multiple sequence alignment data;
- the initial feature acquisition module 720 is configured to perform feature extraction on the multi-sequence alignment data to obtain initial sequence features
- the amplification feature acquisition module 730 is configured to process the initial sequence feature through a sequence feature amplification model to obtain the amplified sequence feature of the protein; the sequence feature amplification model is based on the initial sequence feature sample and amplification A machine learning model obtained by training of a sequence feature sample; the initial sequence feature sample is obtained by performing a sequence alignment query in the first database based on an amino acid sequence sample, and the amplified sequence feature sample is obtained based on the amino acid sequence sample. Obtained by performing a sequence alignment query in the second database; the data size of the second database is larger than the data size of the first database;
- the structure information prediction module 740 is configured to predict the structure information of the protein based on the amplified sequence features.
- the data distribution similarity between the first database and the second database is higher than a similarity threshold.
- the first database is a database obtained after randomly removing a specified proportion of data on the basis of the second database.
- the sequence feature amplification model is a fully convolutional neural network for one-dimensional sequence data, a recurrent neural network model composed of multi-layer long and short-term memory LSTM units, or a bidirectional LSTM unit. Recurrent neural network.
- the initial sequence feature and the amplified sequence feature are a position-specific scoring matrix.
- the device further includes:
- An amplified sample acquisition module configured to process the initial sequence feature sample through the sequence feature amplification model to obtain an amplified initial sequence feature sample
- the model update module is used to update the sequence feature amplification model according to the amplified initial sequence feature sample and the amplified sequence feature sample.
- the model update module includes:
- a loss function acquisition sub-module configured to perform loss function calculation according to the amplified initial sequence feature sample and the amplified sequence feature sample to obtain a loss function value
- the parameter update sub-module is used to update the model parameters in the sequence feature amplification model according to the loss function value.
- the loss function acquisition sub-module includes:
- An error calculation unit configured to calculate the reconstruction error between the amplified initial sequence feature sample and the amplified sequence feature sample
- the loss function acquiring unit is configured to acquire the reconstruction error as the loss function value.
- the error calculation unit calculates a root mean square reconstruction error between the amplified initial sequence feature sample and the amplified sequence feature sample.
- model update module is used to:
- the model parameters in the sequence feature amplification model are updated according to the loss function value.
- the structure information prediction module 740 includes:
- the structure information acquisition sub-module is used to predict the characteristics of the amplified sequence through a protein structure information prediction model to obtain the structure information of the protein;
- the protein structure information prediction model is a model obtained by training based on the sequence characteristics of the protein sample and the structure information of the protein sample.
- the sequence alignment query is performed on the amino acid sequence of the protein
- the feature extraction is performed on the multi-sequence alignment data
- the amplified sequence of the protein is obtained through a sequence feature amplification model. Feature, and then predict the structural information of the protein.
- sequence feature amplification model it is only necessary to perform sequence alignment query in the first database with a smaller data scale, that is, a higher prediction accuracy can be obtained.
- the small first database consumes less time for sequence alignment query. Therefore, the above solution can improve the prediction efficiency of protein structure information while ensuring the prediction accuracy of protein structure information.
- Fig. 8 is a schematic structural diagram of a computer device according to an exemplary embodiment.
- the computer device may be implemented as a training device or a prediction device in each of the foregoing embodiments, or may also be implemented as a combination of a training device and a prediction device.
- the computer device 800 includes a central processing unit (CPU) 801, a system memory 804 including a random access memory (RAM) 802 and a read only memory (ROM) 803, and a system bus 805 connecting the system memory 804 and the central processing unit 801 .
- CPU central processing unit
- RAM random access memory
- ROM read only memory
- the server 800 also includes a basic input/output system (I/O system) 806 to help transfer information between various devices in the computer, and a large-capacity storage for storing the operating system 813, application programs 814, and other program modules 815 Equipment 807.
- I/O system basic input/output system
- a large-capacity storage for storing the operating system 813, application programs 814, and other program modules 815 Equipment 807.
- the basic input/output system 806 includes a display 808 for displaying information and an input device 809 such as a mouse and a keyboard for the user to input information.
- the display 808 and the input device 809 are both connected to the central processing unit 801 through the input and output controller 810 connected to the system bus 805.
- the basic input/output system 806 may also include an input and output controller 810 for receiving and processing input from multiple other devices such as a keyboard, a mouse, or an electronic stylus.
- the input and output controller 810 also provides output to a display screen, a printer, or other types of output devices.
- the mass storage device 807 is connected to the central processing unit 801 through a mass storage controller (not shown) connected to the system bus 805.
- the mass storage device 807 and its associated computer readable medium provide non-volatile storage for the server 800. That is, the mass storage device 807 may include a computer-readable medium (not shown) such as a hard disk or a CD-ROM drive.
- the computer-readable media may include computer storage media and communication media.
- Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storing information such as computer readable instructions, data structures, program modules or other data.
- Computer storage media include RAM, ROM, EPROM, EEPROM, flash memory or other solid-state storage technologies, CD-ROM, DVD or other optical storage, tape cartridges, magnetic tape, disk storage or other magnetic storage devices.
- RAM random access memory
- ROM read-only memory
- EPROM Erasable programmable read-only memory
- EEPROM electrically erasable programmable read-only memory
- the server 800 may be connected to the Internet or other network devices through the network interface unit 811 connected to the system bus 805.
- the memory also includes one or more programs, the one or more programs are stored in the memory, and the central processing unit 801 executes the one or more programs to realize the prediction of the structure information of the protein shown in FIG. 3 or 4. In the method, the steps performed by the computer equipment.
- This application also provides a computer program product, which when the computer program product runs on a computer, causes the computer to execute the methods provided in the foregoing method embodiments.
- FIG. 9 shows a structural block diagram of a terminal 900 provided by an exemplary embodiment of the present application.
- the terminal 900 can be: a smartphone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, moving picture expert compression standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, moving picture expert compressing standard audio Level 4) Player, laptop or desktop computer.
- the terminal 900 may also be called user equipment, portable terminal, laptop terminal, desktop terminal and other names.
- the foregoing terminal may be implemented as the prediction device in each of the foregoing method embodiments. For example, it can be implemented as the prediction device 120 in FIG. 1.
- the terminal 900 includes a processor 901 and a memory 902.
- the processor 901 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on.
- the processor 901 can adopt at least one hardware form among DSP (Digital Signal Processing), FPGA (Field-Programmable Gate Array), and PLA (Programmable Logic Array, Programmable Logic Array). achieve.
- the processor 901 may also include a main processor and a coprocessor.
- the main processor is a processor used to process data in the wake state, also called a CPU (Central Processing Unit, central processing unit); the coprocessor is A low-power processor used to process data in the standby state.
- the processor 901 may be integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is used to render and draw content that needs to be displayed on the display screen.
- the processor 901 may further include an AI (Artificial Intelligence) processor, and the AI processor is used to process computing operations related to machine learning.
- AI Artificial Intelligence
- the memory 902 may include one or more computer-readable storage media, which may be non-transitory.
- the memory 902 may also include high-speed random access memory and non-volatile memory, such as one or more magnetic disk storage devices and flash memory storage devices.
- the non-transitory computer-readable storage medium in the memory 902 is used to store at least one instruction, and the at least one instruction is used to be executed by the processor 901 to realize the protein synthesis provided in the method embodiment of the present application. Prediction method of structural information.
- the terminal 900 may optionally further include: a peripheral device interface 903 and at least one peripheral device.
- the processor 901, the memory 902, and the peripheral device interface 903 may be connected by a bus or a signal line.
- Each peripheral device can be connected to the peripheral device interface 903 through a bus, a signal line, or a circuit board.
- the peripheral device includes: at least one of a radio frequency circuit 904, a touch display screen 905, a camera 906, an audio circuit 907, a positioning component 908, and a power supply 909.
- the peripheral device interface 903 can be used to connect at least one peripheral device related to I/O (Input/Output) to the processor 901 and the memory 902.
- the processor 901, the memory 902, and the peripheral device interface 903 are integrated on the same chip or circuit board; in some other embodiments, any one of the processor 901, the memory 902, and the peripheral device interface 903 or The two can be implemented on a separate chip or circuit board, which is not limited in this embodiment.
- the radio frequency circuit 904 is used for receiving and transmitting RF (Radio Frequency, radio frequency) signals, also called electromagnetic signals.
- the radio frequency circuit 904 communicates with a communication network and other communication devices through electromagnetic signals.
- the radio frequency circuit 904 converts electrical signals into electromagnetic signals for transmission, or converts received electromagnetic signals into electrical signals.
- the radio frequency circuit 904 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a user identity module card, and so on.
- the radio frequency circuit 904 can communicate with other terminals through at least one wireless communication protocol.
- the wireless communication protocol includes but is not limited to: World Wide Web, Metropolitan Area Network, Intranet, various generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area network and/or WiFi (Wireless Fidelity, wireless fidelity) network.
- the radio frequency circuit 904 may also include a circuit related to NFC (Near Field Communication), which is not limited in this application.
- the display screen 905 is used to display a UI (User Interface, user interface).
- the UI can include graphics, text, icons, videos, and any combination thereof.
- the display screen 905 also has the ability to collect touch signals on or above the surface of the display screen 905.
- the touch signal can be input to the processor 901 as a control signal for processing.
- the display screen 905 may also be used to provide virtual buttons and/or virtual keyboards, also called soft buttons and/or soft keyboards.
- the display screen 905 may be a flexible display screen, which is disposed on the curved surface or the folding surface of the terminal 900.
- the display screen 905 can also be configured as a non-rectangular irregular pattern, that is, a special-shaped screen.
- the display screen 905 may be made of materials such as LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode, organic light-emitting diode).
- the camera assembly 906 is used to capture images or videos.
- the camera assembly 906 includes a front camera and a rear camera.
- the front camera is set on the front panel of the terminal, and the rear camera is set on the back of the terminal.
- the camera assembly 906 may also include a flash.
- the flash can be a single-color flash or a dual-color flash. Dual color temperature flash refers to a combination of warm light flash and cold light flash, which can be used for light compensation under different color temperatures.
- the audio circuit 907 may include a microphone and a speaker.
- the microphone is used to collect sound waves of the user and the environment, and convert the sound waves into electrical signals and input to the processor 901 for processing, or input to the radio frequency circuit 904 to implement voice communication.
- the microphone can also be an array microphone or an omnidirectional collection microphone.
- the speaker is used to convert the electrical signal from the processor 901 or the radio frequency circuit 904 into sound waves.
- the speaker can be a traditional thin-film speaker or a piezoelectric ceramic speaker.
- the speaker When the speaker is a piezoelectric ceramic speaker, it can not only convert the electrical signal into human audible sound waves, but also convert the electrical signal into human inaudible sound waves for distance measurement and other purposes.
- the audio circuit 907 may also include a headphone jack.
- the positioning component 908 is used to locate the current geographic location of the terminal 900 to implement navigation or LBS (Location Based Service, location-based service).
- the positioning component 908 may be a positioning component based on the GPS (Global Positioning System, Global Positioning System) of the United States, the Beidou system of China, or the Galileo system of Russia.
- the power supply 909 is used to supply power to various components in the terminal 900.
- the power source 909 may be an alternating current, a direct current, a disposable battery, or a rechargeable battery.
- the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery.
- a wired rechargeable battery is a battery charged through a wired line
- a wireless rechargeable battery is a battery charged through a wireless coil.
- the rechargeable battery can also be used to support fast charging technology.
- the terminal 900 further includes one or more sensors 910.
- the one or more sensors 910 include, but are not limited to: an acceleration sensor 911, a gyroscope sensor 912, a pressure sensor 913, a fingerprint sensor 914, an optical sensor 915, and a proximity sensor 916.
- the acceleration sensor 911 can detect the magnitude of acceleration on the three coordinate axes of the coordinate system established by the terminal 900.
- the acceleration sensor 911 may be used to detect the components of gravitational acceleration on three coordinate axes.
- the processor 901 may control the touch screen 905 to display the user interface in a horizontal view or a vertical view according to the gravity acceleration signal collected by the acceleration sensor 911.
- the acceleration sensor 911 may also be used for game or user motion data collection.
- the gyroscope sensor 912 can detect the body direction and the rotation angle of the terminal 900, and the gyroscope sensor 912 can cooperate with the acceleration sensor 911 to collect the user's 3D actions on the terminal 900.
- the processor 901 can implement the following functions according to the data collected by the gyroscope sensor 912: motion sensing (for example, changing the UI according to the user's tilt operation), image stabilization during shooting, game control, and inertial navigation.
- the pressure sensor 913 may be provided on the side frame of the terminal 900 and/or the lower layer of the touch screen 905.
- the processor 901 performs left and right hand recognition or quick operation according to the holding signal collected by the pressure sensor 913.
- the processor 901 controls the operability controls on the UI interface according to the user's pressure operation on the touch display screen 905.
- the operability control includes at least one of a button control, a scroll bar control, an icon control, and a menu control.
- the fingerprint sensor 914 is used to collect the user's fingerprint, and the processor 901 can identify the user's identity according to the fingerprint collected by the fingerprint sensor 914, or the fingerprint sensor 914 can identify the user's identity according to the collected fingerprint. When it is recognized that the user's identity is a trusted identity, the processor 901 authorizes the user to perform related sensitive operations, including unlocking the screen, viewing encrypted information, downloading software, paying, and changing settings.
- the fingerprint sensor 914 may be provided on the front, back, or side of the terminal 900. When a physical button or a manufacturer logo is provided on the terminal 900, the fingerprint sensor 914 may be integrated with the physical button or the manufacturer logo.
- the optical sensor 915 is used to collect the ambient light intensity.
- the processor 901 may control the display brightness of the touch screen 905 according to the ambient light intensity collected by the optical sensor 915. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 905 is increased; when the ambient light intensity is low, the display brightness of the touch display screen 905 is decreased.
- the processor 901 may also dynamically adjust the shooting parameters of the camera assembly 906 according to the ambient light intensity collected by the optical sensor 915.
- the proximity sensor 916 also called a distance sensor, is usually provided on the front panel of the terminal 900.
- the proximity sensor 916 is used to collect the distance between the user and the front of the terminal 900.
- the processor 901 controls the touch screen 905 to switch from the on-screen state to the off-screen state; when the proximity sensor 916 detects When the distance between the user and the front of the terminal 900 gradually increases, the processor 901 controls the touch display screen 905 to switch from the rest screen state to the bright screen state.
- FIG. 9 does not constitute a limitation on the terminal 900, and may include more or fewer components than shown in the figure, or combine certain components, or adopt different component arrangements.
- the program can be stored in a computer-readable storage medium.
- the medium may be a computer-readable storage medium included in the memory in the foregoing embodiment; or may be a computer-readable storage medium that exists alone and is not assembled into the terminal.
- the computer-readable storage medium stores at least one instruction, at least one program, code set or instruction set, and the at least one instruction, the at least one program, the code set or the instruction set is loaded and executed by the processor In order to realize the protein structure information prediction method as described in FIG. 3 or FIG. 4.
- the computer-readable storage medium may include: read only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), solid state drive (SSD, Solid State Drives), optical disks, and the like.
- random access memory may include resistive random access memory (ReRAM, Resistance Random Access Memory) and dynamic random access memory (DRAM, Dynamic Random Access Memory).
- ReRAM resistive random access memory
- DRAM Dynamic Random Access Memory
- a computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
- the processor of the computer device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the protein structure information prediction method provided in the various alternative implementations of the foregoing aspects.
- the program can be stored in a computer-readable storage medium.
- the storage medium mentioned can be a read-only memory, a magnetic disk or an optical disk, etc.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Medical Informatics (AREA)
- Biophysics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Bioethics (AREA)
- Databases & Information Systems (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Analytical Chemistry (AREA)
- Molecular Biology (AREA)
- Physiology (AREA)
- Crystallography & Structural Chemistry (AREA)
- Data Mining & Analysis (AREA)
- Genetics & Genomics (AREA)
- Epidemiology (AREA)
- Software Systems (AREA)
- Public Health (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (24)
- 一种蛋白质的结构信息预测方法,其特征在于,所述方法由计算机设备执行,所述方法包括:根据蛋白质的氨基酸序列在第一数据库中进行序列对齐查询,获得多序列对齐数据;对所述多序列对齐数据进行特征提取,获得初始序列特征;通过序列特征扩增模型对所述初始序列特征进行处理,获得所述蛋白质的扩增序列特征;所述序列特征扩增模型是通过初始序列特征样本和扩增序列特征样本训练获得的机器学习模型;所述初始序列特征样本是根据氨基酸序列样本在所述第一数据库中进行序列对齐查询获得的,所述扩增序列特征样本是根据所述氨基酸序列样本在第二数据库中进行序列对齐查询获得的;所述第二数据库的数据规模大于所述第一数据库的数据规模;通过所述扩增序列特征预测所述蛋白质的结构信息。
- 根据权利要求1所述的方法,其特征在于,所述第一数据库和所述第二数据库之间的数据分布相似度高于相似度阈值。
- 根据权利要求2所述的方法,其特征在于,所述第一数据库是在所述第二数据库的基础上随机剔除指定比例的数据后获得的数据库。
- 根据权利要求1所述的方法,其特征在于,所述序列特征扩增模型是针对一维序列数据的全卷积神经网络、由多层长短期记忆LSTM单元构成的循环神经网络模型或者由双向LSTM单元构成的循环神经网络。
- 根据权利要求1所述的方法,其特征在于,所述初始序列特征和所述扩增序列特征为位置特异性得分矩阵。
- 根据权利要求1至5任一所述的方法,其特征在于,所述根据蛋白质的氨基酸序列在第一数据库中进行序列对齐查询,获得多序列对齐数据之后,还包 括:通过所述序列特征扩增模型对所述初始序列特征样本进行处理,获得扩增后的初始序列特征样本;根据所述扩增后的初始序列特征样本,以及所述扩增序列特征样本,对所述序列特征扩增模型进行更新。
- 根据权利要求6所述的方法,其特征在于,所述根据所述扩增后的初始序列特征样本,以及所述扩增序列特征样本,对所述序列特征扩增模型进行更新,包括:根据所述扩增后的初始序列特征样本,以及所述扩增序列特征样本进行损失函数计算,获得损失函数值;根据所述损失函数值对所述序列特征扩增模型中的模型参数进行更新。
- 根据权利要求7所述的方法,其特征在于,所述根据所述扩增后的初始序列特征样本,以及所述扩增序列特征样本进行损失函数计算,获得损失函数值,包括:计算所述扩增后的初始序列特征样本与所述扩增序列特征样本之间的重构误差;将所述重构误差获取为所述损失函数值。
- 根据权利要求8所述的方法,其特征在于,所述计算所述扩增后的初始序列特征样本与所述扩增序列特征样本之间的重构误差,包括:计算所述扩增后的初始序列特征样本与所述扩增序列特征样本之间的均方根重构误差。
- 根据权利要求7所述的方法,其特征在于,所述根据所述损失函数值对所述序列特征扩增模型中的模型参数进行更新,包括:当根据所述损失函数值确定所述序列特征扩增模型未收敛时,根据所述损失函数值对所述序列特征扩增模型中的模型参数进行更新。
- 根据权利要求1至5任一所述的方法,其特征在于,所述通过所述扩增序列特征预测所述蛋白质的结构信息,包括:通过蛋白质结构信息预测模型对所述扩增序列特征进行预测,获得所述蛋白质的结构信息;其中,所述蛋白质结构信息预测模型是根据蛋白质样本的序列特征,以及所述蛋白质样本的结构信息训练获得的模型。
- 一种蛋白质的结构信息预测装置,其特征在于,所述装置用于计算机设备中,所述装置包括:数据获取模块,用于根据蛋白质的氨基酸序列在第一数据库中进行序列对齐查询,获得多序列对齐数据;初始特征获取模块,用于对所述多序列对齐数据进行特征提取,获得初始序列特征;扩增特征获取模块,用于通过序列特征扩增模型对所述初始序列特征进行处理,获得所述蛋白质的扩增序列特征;所述序列特征扩增模型是通过初始序列特征样本和扩增序列特征样本训练获得的机器学习模型;所述初始序列特征样本是根据氨基酸序列样本在所述第一数据库中进行序列对齐查询获得的,所述扩增序列特征样本是根据所述氨基酸序列样本在第二数据库中进行序列对齐查询获得的;所述第二数据库的数据规模大于所述第一数据库的数据规模;结构信息预测模块,用于通过所述扩增序列特征预测所述蛋白质的结构信息。
- 根据权利要求12所述的装置,其特征在于,所述第一数据库和所述第二数据库之间的数据分布相似度高于相似度阈值。
- 根据权利要求13所述的装置,其特征在于,所述第一数据库是在所述第二数据库的基础上随机剔除指定比例的数据后获得的数据库。
- 根据权利要求12所述的装置,其特征在于,所述序列特征扩增模型是针对一维序列数据的全卷积神经网络、由多层长短期记忆LSTM单元构成的循环 神经网络模型或者由双向LSTM单元构成的循环神经网络。
- 根据权利要求12所述的装置,其特征在于,所述初始序列特征和所述扩增序列特征为位置特异性得分矩阵。
- 根据权利要求12至16任一所述的装置,其特征在于,所述装置还包括:扩增样本获取模块,用于通过所述序列特征扩增模型对所述初始序列特征样本进行处理,获得扩增后的初始序列特征样本;模型更新模块,用于根据所述扩增后的初始序列特征样本,以及所述扩增序列特征样本,对所述序列特征扩增模型进行更新。
- 根据权利要求17所述的装置,其特征在于,所述模型更新模块,包括:损失函数获取子模块,用于根据所述扩增后的初始序列特征样本,以及所述扩增序列特征样本进行损失函数计算,获得损失函数值;参数更新子模块,用于根据所述损失函数值对所述序列特征扩增模型中的模型参数进行更新。
- 根据权利要求18所述的装置,其特征在于,所述损失函数获取子模块,包括:误差计算单元,用于计算所述扩增后的初始序列特征样本与所述扩增序列特征样本之间的重构误差;损失函数获取单元,用于将所述重构误差获取为所述损失函数值。
- 根据权利要求19所述的装置,其特征在于,所述误差计算单元计算所述扩增后的初始序列特征样本与所述扩增序列特征样本之间的均方根重构误差。
- 根据权利要求18所述的装置,其特征在于,所述模型更新模块,用于,当根据所述损失函数值确定所述序列特征扩增模型未收敛时,根据所述损失函数值对所述序列特征扩增模型中的模型参数进行更新。
- 根据权利要求12至16任一所述的装置,其特征在于,所述结构信息预测模块,包括:结构信息获取子模块,用于通过蛋白质结构信息预测模型对所述扩增序列特征进行预测,获得所述蛋白质的结构信息;其中,所述蛋白质结构信息预测模型是根据蛋白质样本的序列特征,以及所述蛋白质样本的结构信息训练获得的模型。
- 一种计算机设备,其特征在于,计算机设备包含处理器和存储器,所述存储器中存储由至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如权利要求1至11任一所述的蛋白质结构信息预测方法。
- 一种计算机可读存储介质,其特征在于,所述存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如权利要求1至11任一所述的蛋白质结构信息预测方法。
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2022514493A JP7291853B2 (ja) | 2019-10-30 | 2020-09-10 | タンパク質構造情報予測方法及び装置、コンピュータデバイス、並びにコンピュータプログラム |
| EP20882879.8A EP4009328B1 (en) | 2019-10-30 | 2020-09-10 | Method, device and apparatus for predicting protein structure information, and storage medium |
| US17/539,946 US12288599B2 (en) | 2019-10-30 | 2021-12-01 | Protein structure information prediction method and apparatus, device, and storage medium |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201911042649.9A CN110706738B (zh) | 2019-10-30 | 2019-10-30 | 蛋白质的结构信息预测方法、装置、设备及存储介质 |
| CN201911042649.9 | 2019-10-30 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/539,946 Continuation US12288599B2 (en) | 2019-10-30 | 2021-12-01 | Protein structure information prediction method and apparatus, device, and storage medium |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2021082753A1 true WO2021082753A1 (zh) | 2021-05-06 |
Family
ID=69203871
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2020/114386 Ceased WO2021082753A1 (zh) | 2019-10-30 | 2020-09-10 | 蛋白质的结构信息预测方法、装置、设备及存储介质 |
Country Status (5)
| Country | Link |
|---|---|
| US (1) | US12288599B2 (zh) |
| EP (1) | EP4009328B1 (zh) |
| JP (1) | JP7291853B2 (zh) |
| CN (1) | CN110706738B (zh) |
| WO (1) | WO2021082753A1 (zh) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114300038A (zh) * | 2021-12-27 | 2022-04-08 | 山东师范大学 | 基于改进生物地理学优化算法的多序列比对方法及系统 |
| CN119811500A (zh) * | 2024-12-18 | 2025-04-11 | 上海交通大学 | 基于多分子模态融合的通用蛋白质-rna结合预测方法 |
Families Citing this family (19)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110706738B (zh) | 2019-10-30 | 2020-11-20 | 腾讯科技(深圳)有限公司 | 蛋白质的结构信息预测方法、装置、设备及存储介质 |
| CN111243668B (zh) * | 2020-04-09 | 2020-08-07 | 腾讯科技(深圳)有限公司 | 分子结合位点检测方法、装置、电子设备及存储介质 |
| CN111755065B (zh) * | 2020-06-15 | 2024-05-17 | 重庆邮电大学 | 一种基于虚拟网络映射和云并行计算的蛋白质构象预测加速方法 |
| US20240153577A1 (en) * | 2020-11-28 | 2024-05-09 | Deepmind Technologies Limited | Predicting symmetrical protein structures using symmetrical expansion transformations |
| CN112289370B (zh) * | 2020-12-28 | 2021-03-23 | 武汉金开瑞生物工程有限公司 | 一种蛋白质结构预测方法及装置 |
| CN112837204B (zh) * | 2021-02-26 | 2024-07-23 | 北京小米移动软件有限公司 | 序列处理方法、序列处理装置及存储介质 |
| CN113255770B (zh) * | 2021-05-26 | 2023-10-27 | 北京百度网讯科技有限公司 | 化合物属性预测模型训练方法和化合物属性预测方法 |
| CN113837036B (zh) * | 2021-09-09 | 2024-08-02 | 成都齐碳科技有限公司 | 生物聚合物的表征方法、装置、设备及计算机存储介质 |
| CN115881211B (zh) * | 2021-12-23 | 2024-02-20 | 上海智峪生物科技有限公司 | 蛋白质序列比对方法、装置、计算机设备以及存储介质 |
| CN116564412A (zh) * | 2022-01-27 | 2023-08-08 | 京东方科技集团股份有限公司 | 蛋白质间相互作用的预测方法、装置及电子设备 |
| CN114613427B (zh) * | 2022-03-15 | 2023-01-31 | 水木未来(北京)科技有限公司 | 蛋白质三维结构预测方法及装置、电子设备和存储介质 |
| CN114974437A (zh) * | 2022-04-26 | 2022-08-30 | 北京理工大学 | 一种分析蛋白质稳态系综结构变化及关键氨基酸的方法 |
| CN115101122A (zh) * | 2022-05-23 | 2022-09-23 | 清华大学 | 蛋白质处理方法、设备、存储介质和计算机程序产品 |
| CN115116559B (zh) * | 2022-06-21 | 2023-04-18 | 北京百度网讯科技有限公司 | 氨基酸中原子坐标的确定及训练方法、装置、设备和介质 |
| CN115240044B (zh) * | 2022-07-22 | 2023-06-06 | 水木未来(北京)科技有限公司 | 蛋白质电子密度图处理方法、装置、电子设备和存储介质 |
| CN117292743A (zh) * | 2022-09-05 | 2023-12-26 | 北京分子之心科技有限公司 | 用于预测蛋白质复合物结构的方法、设备、介质及程序产品 |
| CN117253541A (zh) * | 2022-10-24 | 2023-12-19 | 腾讯科技(深圳)有限公司 | 蛋白质处理方法、装置、计算机设备及存储介质 |
| CN115831219B (zh) * | 2022-12-22 | 2024-05-28 | 郑州思昆生物工程有限公司 | 一种质量预测方法、装置、设备及存储介质 |
| CN118398079A (zh) * | 2024-06-25 | 2024-07-26 | 中国人民解放军军事科学院军事医学研究院 | 一种预测氨基酸突变效应或对蛋白质进行设计改造的计算机装置、方法和应用 |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20100057419A1 (en) * | 2008-08-29 | 2010-03-04 | Laboratory of Computational Biology, Center for DNA Fingerprinting and Diagnostics | Fold-wise classification of proteins |
| CN106951736A (zh) * | 2017-03-14 | 2017-07-14 | 齐鲁工业大学 | 一种基于多重进化矩阵的蛋白质二级结构预测方法 |
| CN108197427A (zh) * | 2018-01-02 | 2018-06-22 | 山东师范大学 | 基于深度卷积神经网络的蛋白质亚细胞定位方法和装置 |
| CN109411018A (zh) * | 2019-01-23 | 2019-03-01 | 上海宝藤生物医药科技股份有限公司 | 根据基因突变信息对样本分类的方法、装置、设备及介质 |
| CN110706738A (zh) * | 2019-10-30 | 2020-01-17 | 腾讯科技(深圳)有限公司 | 蛋白质的结构信息预测方法、装置、设备及存储介质 |
Family Cites Families (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7157266B2 (en) * | 1999-01-25 | 2007-01-02 | Brookhaven Science Associates Llc | Structure of adenovirus bound to cellular receptor car |
| AU2002250066A1 (en) * | 2001-02-12 | 2002-08-28 | Rosetta Inpharmatics, Inc. | Confirming the exon content of rna transcripts by pcr using primers complementary to each respective exon |
| CN103175873B (zh) * | 2013-01-27 | 2015-11-18 | 福州市第二医院 | 基于目标dna重复序列自身增强放大信号的dna电化学传感器 |
| CN104615911B (zh) * | 2015-01-12 | 2017-07-18 | 上海交通大学 | 基于稀疏编码及链学习预测膜蛋白beta‑barrel跨膜区域的方法 |
| CN105574359B (zh) * | 2015-12-15 | 2018-09-14 | 上海珍岛信息技术有限公司 | 一种蛋白质模板库的扩充方法及装置 |
| CN107563150B (zh) * | 2017-08-31 | 2021-03-19 | 深圳大学 | 蛋白质结合位点的预测方法、装置、设备及存储介质 |
| CN109147868B (zh) * | 2018-07-18 | 2022-03-22 | 深圳大学 | 蛋白质功能预测方法、装置、设备及存储介质 |
| CN109300501B (zh) * | 2018-09-20 | 2021-02-02 | 国家卫生健康委科学技术研究所 | 蛋白质三维结构预测方法及用其构建的预测云平台 |
| CN109255339B (zh) * | 2018-10-19 | 2021-04-06 | 西安电子科技大学 | 基于自适应深度森林人体步态能量图的分类方法 |
| CN110097130B (zh) * | 2019-05-07 | 2022-12-13 | 深圳市腾讯计算机系统有限公司 | 分类任务模型的训练方法、装置、设备及存储介质 |
| CN115136246B (zh) * | 2019-08-02 | 2025-09-09 | 旗舰开拓创新六世公司 | 机器学习引导的多肽设计 |
-
2019
- 2019-10-30 CN CN201911042649.9A patent/CN110706738B/zh active Active
-
2020
- 2020-09-10 EP EP20882879.8A patent/EP4009328B1/en active Active
- 2020-09-10 JP JP2022514493A patent/JP7291853B2/ja active Active
- 2020-09-10 WO PCT/CN2020/114386 patent/WO2021082753A1/zh not_active Ceased
-
2021
- 2021-12-01 US US17/539,946 patent/US12288599B2/en active Active
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20100057419A1 (en) * | 2008-08-29 | 2010-03-04 | Laboratory of Computational Biology, Center for DNA Fingerprinting and Diagnostics | Fold-wise classification of proteins |
| CN106951736A (zh) * | 2017-03-14 | 2017-07-14 | 齐鲁工业大学 | 一种基于多重进化矩阵的蛋白质二级结构预测方法 |
| CN108197427A (zh) * | 2018-01-02 | 2018-06-22 | 山东师范大学 | 基于深度卷积神经网络的蛋白质亚细胞定位方法和装置 |
| CN109411018A (zh) * | 2019-01-23 | 2019-03-01 | 上海宝藤生物医药科技股份有限公司 | 根据基因突变信息对样本分类的方法、装置、设备及介质 |
| CN110706738A (zh) * | 2019-10-30 | 2020-01-17 | 腾讯科技(深圳)有限公司 | 蛋白质的结构信息预测方法、装置、设备及存储介质 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4009328A4 * |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114300038A (zh) * | 2021-12-27 | 2022-04-08 | 山东师范大学 | 基于改进生物地理学优化算法的多序列比对方法及系统 |
| CN114300038B (zh) * | 2021-12-27 | 2023-09-29 | 山东师范大学 | 基于改进生物地理学优化算法的多序列比对方法及系统 |
| CN119811500A (zh) * | 2024-12-18 | 2025-04-11 | 上海交通大学 | 基于多分子模态融合的通用蛋白质-rna结合预测方法 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN110706738A (zh) | 2020-01-17 |
| CN110706738B (zh) | 2020-11-20 |
| EP4009328A4 (en) | 2022-09-14 |
| JP2022547041A (ja) | 2022-11-10 |
| JP7291853B2 (ja) | 2023-06-15 |
| EP4009328B1 (en) | 2025-06-25 |
| EP4009328A1 (en) | 2022-06-08 |
| US20220093213A1 (en) | 2022-03-24 |
| US12288599B2 (en) | 2025-04-29 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN110706738B (zh) | 蛋白质的结构信息预测方法、装置、设备及存储介质 | |
| CN111476306B (zh) | 基于人工智能的物体检测方法、装置、设备及存储介质 | |
| CN111866607B (zh) | 视频片段定位方法、装置、计算机设备及存储介质 | |
| CN110121118B (zh) | 视频片段定位方法、装置、计算机设备及存储介质 | |
| CN111243668B (zh) | 分子结合位点检测方法、装置、电子设备及存储介质 | |
| CN110134804B (zh) | 图像检索方法、装置及存储介质 | |
| CN111192262A (zh) | 基于人工智能的产品缺陷分类方法、装置、设备及介质 | |
| WO2022127919A1 (zh) | 表面缺陷检测方法、装置、系统、存储介质及程序产品 | |
| CN113516665B (zh) | 图像分割模型的训练方法、图像分割方法、装置、设备 | |
| CN110555839A (zh) | 缺陷检测识别方法、装置、计算机设备及存储介质 | |
| CN111739035A (zh) | 基于人工智能的图像处理方法、装置、设备及存储介质 | |
| CN109086709A (zh) | 特征提取模型训练方法、装置及存储介质 | |
| CN113724189A (zh) | 图像处理方法、装置、设备及存储介质 | |
| CN111680697B (zh) | 实现领域自适应的方法、装置、电子设备及介质 | |
| CN113918767B (zh) | 视频片段定位方法、装置、设备及存储介质 | |
| CN110175653A (zh) | 图像识别的方法、装置、设备及存储介质 | |
| CN111833689A (zh) | 一种电学实验的评分方法及装置 | |
| WO2021000956A1 (zh) | 一种智能模型的升级方法及装置 | |
| CN113822791A (zh) | 图像配准方法、配准网络训练方法、装置、设备及介质 | |
| CN113505256B (zh) | 特征提取网络训练方法、图像处理方法及装置 | |
| CN111898535A (zh) | 目标识别方法、装置及存储介质 | |
| CN111611414A (zh) | 车辆检索方法、装置及存储介质 | |
| CN114328948A (zh) | 文本标准化模型的训练方法、文本标准化方法及装置 | |
| CN110853704B (zh) | 蛋白质数据获取方法、装置、计算机设备及存储介质 | |
| CN113052240A (zh) | 图像处理模型的确定方法、装置、设备及存储介质 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20882879 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 20882879.8 Country of ref document: EP |
|
| ENP | Entry into the national phase |
Ref document number: 2022514493 Country of ref document: JP Kind code of ref document: A |
|
| ENP | Entry into the national phase |
Ref document number: 2020882879 Country of ref document: EP Effective date: 20220301 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| WWG | Wipo information: grant in national office |
Ref document number: 2020882879 Country of ref document: EP |
