WO2023179609A1 - 一种数据处理方法及装置 - Google Patents
一种数据处理方法及装置 Download PDFInfo
- Publication number
- WO2023179609A1 WO2023179609A1 PCT/CN2023/082786 CN2023082786W WO2023179609A1 WO 2023179609 A1 WO2023179609 A1 WO 2023179609A1 CN 2023082786 W CN2023082786 W CN 2023082786W WO 2023179609 A1 WO2023179609 A1 WO 2023179609A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- hyperparameter
- samples
- neural
- predictor
- prediction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/0985—Hyperparameter optimisation; Meta-learning; Learning-to-learn
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/01—Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06393—Score-carding, benchmarking or key performance indicator [KPI] analysis
Definitions
- This application relates to the field of artificial intelligence technology, and in particular to a data processing method and device.
- Black box optimization also known as Hyper Parameter Optimization
- Hyper Parameter Optimization is an important technology in scientific research and industrial production.
- You can find a better parameter combination by trying different parameter combinations and observing the system output results. This attempt is expensive and requires a long time or a large amount of certain resources to obtain the output results.
- black-box optimization can be used to solve the problem.
- neural predictors used for prediction
- several sets of prediction indicators corresponding to hyperparameters are obtained in advance before hyperparameter search.
- neural predictors require more training data to train neural predictors with generalization properties.
- black-box optimization scenarios a single evaluation generally costs a lot, so less training data is available, and the trained neural predictor has poor generalization, resulting in poor search results.
- Embodiments of the present application provide a data processing method and device that use fewer training samples to obtain a neural predictor with better generalization performance.
- embodiments of the present application provide a data processing method, including: receiving hyperparameter information sent by a user device, where the hyperparameter information is used to indicate a hyperparameter search space corresponding to a user task; and sampling multiple hyperparameter search spaces from the hyperparameter search space.
- Parameter combination use the first hyperparameter combination, multiple samples included in the training set, and the evaluation indicators of the multiple samples as the input of the neural predictor, and determine the prediction indicator corresponding to the first hyperparameter combination through the neural predictor, and the first hyperparameter Combined into any one of multiple hyperparameter combinations to obtain multiple prediction indicators corresponding to multiple hyperparameter combinations; send K hyperparameter combinations to the user device, where K is a positive integer; among them, K corresponding to the K hyperparameter combinations A predictive indicator is the highest K among multiple predictive indicators.
- user tasks can be molecular design tasks, materials science tasks, factory debugging tasks, chip design tasks, neural network structure design tasks, neural network training and tuning tasks, etc.
- Design any task with neural network structure Taking the task as an example, the user task needs to optimize the design parameters of the neural network structure, such as the number of convolution layers, convolution kernel size, expansion size, etc.
- Users can perform specific user tasks based on the received hyperparameter combinations, such as video classification, text recognition, image beautification, speech recognition and other tasks.
- Hyper-parameters can be understood as operating parameters of a system, product or process. Hyperparameter information can be understood as containing the value range or value conditions of some hyperparameters.
- hyperparameters are parameters whose initialization values are set by the user before starting the learning process. They are parameters that cannot be learned through the training process of the neural network itself.
- these hyperparameters include: convolution kernel size, number of neural network layers, activation function, loss function, type of optimizer used, learning rate, batch size batch_size, number of training rounds epoch, etc.
- the hyperparameter search space includes some hyperparameters required by the user's task.
- the value of each hyperparameter can be a continuously distributed value or a discrete distributed value. For example:
- wd numerical type (0.02, 0.4, 0.01), indicating weight attenuation
- dropout numerical type (0.0, 0.3, 0.025), indicating the probability of dropout
- drop_conn_rate numerical type (0.0, 0.4, 0.025), indicating the probability of drop connection
- mixup numerical type (0.0, 1.0, 0.05), indicating the distribution parameters of mixup;
- color numerical type (0.0, 0.5, 0.025), indicating the intensity of color data enhancement
- re_prob Numeric type (0.0, 0.4, 0.025), indicating the probability of random erase.
- hyperparameter space is just an example. In actual applications, any hyperparameters that need to be optimized can be defined.
- the input to the neural predictor provided in this application includes more than just hyperparameters. It also includes samples in the training set (also called superparameter samples) and corresponding evaluation indicators. Hyperparameter samples and evaluation indicators of the hyperparameter samples are used to assist in predicting the hyperparameter combination sampled from the hyperparameter search space. Since the input of the neural predictor includes hyperparameter samples and evaluation indicators that already have evaluation indicators, that is, when predicting the hyperparameters When combined, hyperparameter samples that already have evaluation indicators and evaluation indicators can be combined to improve the accuracy of prediction.
- the input of existing neural predictors only includes target evaluation samples, and there are no other reference samples and evaluation indicators. It is necessary to obtain evaluation indicators of many real samples in advance to train the neural predictor.
- the input of the neural predictor includes hyperparameter samples and evaluation indicators that already have evaluation indicators.
- the evaluation indicators of the hyperparameter sample have been referred to, so that the accuracy of predicting the prediction indicators of the target sample is This improves the accuracy of adjusting the weight of the neural predictor based on the accuracy of the prediction index, thereby reducing the number of training rounds and thus reducing the number of training samples used. With fewer training samples, better generalization can be obtained. Good neural predictors.
- K evaluation indicators corresponding to the K hyperparameter combinations sent by the user equipment are received; the K hyperparameter combinations are used as K samples, and the K samples and The corresponding K evaluation indicators are added to the training set.
- the training set is continuously updated and combined with the updated training set to predict the prediction results corresponding to the hyperparameter combination. That is, the evaluation indicators of the samples participating in the auxiliary prediction are better, so the accuracy of prediction can be improved.
- the neural predictor is trained in the following manner: selecting multiple samples from the training set, evaluation indicators corresponding to the multiple samples, and selecting a target from the training set Samples; use the multiple samples, the evaluation indicators corresponding to the multiple samples, and the target sample as inputs to the neural predictor, and determine the prediction indicators corresponding to the target sample through the neural predictor; according to the The network parameters of the neural predictor are adjusted based on the comparison results between the prediction index of the target sample and the evaluation index corresponding to the target sample.
- the input of existing neural predictors only includes target evaluation samples, and there are no other reference samples and evaluation indicators. It is necessary to obtain evaluation indicators of many real samples in advance to train the neural predictor.
- the input of the neural predictor includes hyperparameter samples and evaluation indicators that already have evaluation indicators.
- the evaluation indicators of the hyperparameter sample have been referred to, so that the accuracy of predicting the prediction indicators of the target sample is This improves the accuracy of adjusting the weight of the neural predictor based on the accuracy of the prediction index, thereby reducing the number of training rounds and thus reducing the number of training samples used. With fewer training samples, better generalization can be obtained. Good neural predictors.
- the training set before each round of using the neural predictor to determine prediction indicators corresponding to multiple hyperparameter combinations, can be used to train the neural predictor.
- the training set can be updated, and the generalization of the trained neural predictor becomes increasingly better.
- the first hyperparameter combination, multiple samples included in the training set, and evaluation indicators of the multiple samples are used as inputs to the neural predictor, and the first hyperparameter is determined through the neural predictor.
- the prediction index corresponding to the parameter combination includes: inputting the first hyperparameter combination, multiple samples included in the training set, and evaluation indicators of the multiple samples into the neural predictor; the neural predictor is based on the The first hyperparameter combination, the multiple samples, the evaluation indicators of the multiple samples, and the two anchor point features determine the prediction indicators corresponding to the first hyperparameter combination; wherein, the two anchor point features are The coding features of the lowest predictive index and the coding features of the highest predictive index are used to calibrate the user task.
- two anchor point features are used to participate in the prediction of hyperparameter combinations.
- the two anchor point features are used to calibrate the coding features of the lowest predictive index of the user task and the coding features of the highest predictive index, thereby preventing the prediction results from deviating from the prediction range. , further improving the accuracy of prediction.
- the T+2 weights include the weights of T samples and two The weight of the anchor point feature; the neural predictor weights T+2 evaluation indicators according to T+2 weights to obtain the prediction indicators of the first hyperparameter combination; wherein the T+2 evaluation indicators include all The evaluation indicators of the T samples and the evaluation indicators corresponding to the two anchor point features are described.
- the similarity is calculated to allow samples to participate in the prediction of the hyperparameter combination, and the prediction indicators of the hyperparameter combination are obtained through weighted evaluation indicators, which can improve the accuracy of the prediction results of the hyperparameter combination.
- the input of existing neural predictors only includes target evaluation samples, and there are no other reference samples and evaluation indicators. It is necessary to obtain evaluation indicators of many real samples in advance to train the neural predictor.
- the input of the neural predictor includes hyperparameter samples and evaluation indicators that already have evaluation indicators.
- the evaluation indicators of the hyperparameter sample have been referred to, so that the accuracy of predicting the prediction indicators of the target sample is This improves the accuracy of adjusting the weight of the neural predictor based on the accuracy of the prediction index, thereby reducing the number of training rounds, thereby reducing the number of training samples used, and using fewer training samples to obtain better generalization. Excellent neural predictors.
- the two anchor point features belong to network parameters of the neural predictor.
- the two anchor point features are learnable as network parameters. During the process of training the neural predictor, the update of the two anchor point features is supported.
- the number of input samples supported by the neural predictor is T; combining the first hyperparameter, Multiple samples included in the training set and evaluation indicators of the multiple samples are used as inputs to the neural predictor, and the predictive indicators corresponding to the first hyperparameter combination are determined through the neural predictor, including: the neural predictor pair
- the input T samples are encoded to obtain T auxiliary features, and the first hyperparameter combination is encoded to obtain the target feature; the neural predictor determines the similarity between the target feature and the T auxiliary features respectively; so The neural predictor determines the weights corresponding to the T samples based on the similarity between the target feature and the T auxiliary features; the neural predictor determines the weights corresponding to the T samples based on the weights corresponding to the T samples.
- the evaluation indicators are weighted to obtain the predictive indicators of the first hyperparameter combination.
- the neural predictor determines the similarity between the target feature and the T auxiliary features and the similarity between the target feature and two anchor point features, including: the neural prediction
- the device performs inner product processing on the target feature and the T auxiliary features to obtain the similarities corresponding to the target feature and the T auxiliary features, and performs inner product processing on the target feature and the two anchor point features respectively.
- Inner product processing is performed to obtain the similarity between the target feature and the two anchor point features.
- the number of hyperparameter samples supported by the neural predictor input is T; the first hyperparameter combination, multiple samples included in the training set and the evaluation indicators of the multiple samples are used as the The input of the neural predictor, determining the prediction index corresponding to the first hyperparameter combination through the neural predictor, includes: inputting T+1 connection parameter information into the neural predictor; the T+1 connection parameter information includes T connection parameter information obtained by connecting each sample among T samples and the corresponding evaluation index, and connection parameter information obtained by connecting the first hyperparameter combination and the target prediction index mask, and the target prediction index mask
- the code is used to characterize the unknown prediction index corresponding to the first hyperparameter combination; the neural predictor performs similarity matching on every two connection parameter information among the input T+1 connection parameter information to obtain each two The similarity between connection parameter information; the neural predictor determines the prediction index of the first hyperparameter combination based on the similarity between every two connection parameter information in the T+1 connection parameter information.
- the similarity is calculated to allow samples to participate in the prediction of the hyperparameter combination, and the prediction indicators of the hyperparameter combination are obtained through weighted evaluation indicators, which can improve the accuracy of the prediction results of the hyperparameter combination.
- the input of existing neural predictors only includes target evaluation samples, and there are no other reference samples and evaluation indicators. It is necessary to obtain evaluation indicators of many real samples in advance to train the neural predictor.
- the input of the neural predictor includes hyperparameter samples and evaluation indicators that already have evaluation indicators.
- the evaluation indicators of the hyperparameter sample have been referred to, so that the accuracy of predicting the prediction indicators of the target sample is This improves the accuracy of adjusting the weight of the neural predictor based on the accuracy of the prediction index, thereby reducing the number of training rounds, thereby reducing the number of training samples used, and using fewer training samples to obtain better generalization. Excellent neural predictors.
- embodiments of the present application also provide a data processing device, including: a receiving unit, configured to receive hyperparameter information sent by the user equipment, where the hyperparameter information is used to indicate the hyperparameter search space corresponding to the user task; processing A unit configured to sample multiple hyperparameter combinations from the hyperparameter search space; use the first hyperparameter combination, multiple samples included in the training set, and evaluation indicators of the multiple samples as inputs to the neural predictor, through the The neural predictor determines the prediction index corresponding to the first hyperparameter combination, and the first hyperparameter combination is any one of the multiple hyperparameter combinations, so as to obtain multiple predictions corresponding to the multiple hyperparameter combinations.
- Indicator a sending unit, configured to send K hyperparameter combinations to the user equipment, where K is a positive integer; wherein the K prediction indicators corresponding to the K hyperparameter combinations are the highest K among the plurality of prediction indicators. indivual.
- the receiving unit is also configured to receive K evaluation indicators corresponding to the K hyperparameter combinations sent by the user equipment; the processing unit is also configured to convert the K hyperparameters into Combine them into K samples, and add the K samples and the corresponding K evaluation indicators to the training set.
- the processor is further configured to train the neural predictor in the following manner: select multiple samples from the training set, evaluation indicators corresponding to the multiple samples, and select the neural predictor from the training set. Select a target sample from the training set; use the multiple samples, the evaluation indicators corresponding to the multiple samples, and the target sample as inputs to the neural predictor, and determine the corresponding target sample through the neural predictor The prediction index; according to the comparison result of the prediction index of the target sample and the evaluation index corresponding to the target sample, adjust the network parameters of the neural predictor.
- the processing unit is specifically configured to: input the first hyperparameter combination, multiple samples included in the training set, and evaluation indicators of the multiple samples into the neural predictor ;
- the neural predictor determines the prediction index corresponding to the first hyperparameter combination based on the first hyperparameter combination, the multiple samples, the evaluation indicators of the multiple samples, and the two anchor point features; wherein,
- the two anchor point features are used to calibrate the coding feature of the lowest predictive index and the coding feature of the highest predictive index of the user task.
- the number of input samples supported by the neural predictor is T, and T is a positive integer;
- the processing unit is specifically configured to: the neural predictor encodes the input T samples to obtain T auxiliary features, and encoding the first hyperparameter combination to obtain a target feature; the neural predictor determines the similarity between the target feature and the T auxiliary features and the similarity between the target feature and the two anchors.
- the neural predictor determines T+2 weights based on the similarity between the target feature and the T auxiliary features and the two anchor point features, and the T+2 weights include T The weights of samples and the weights of two anchor point features; the neural predictor weights T+2 evaluation indicators according to T+2 weights to obtain the prediction indicators of the first hyperparameter combination; wherein, the T+ The two evaluation indicators include the evaluation indicators of the T samples and the evaluation indicators corresponding to the two anchor point features.
- the two anchor point features belong to network parameters of the neural predictor.
- the number of input samples supported by the neural predictor is T, and T is a positive integer; the processing unit is specifically used to: encode the input T samples through the neural predictor to obtain T auxiliary features are used to encode the first hyperparameter combination to obtain the target feature; the neural predictor is used to determine the similarity between the target feature and the T auxiliary features; the neural predictor is used to determine the similarity between the target feature and the T auxiliary features according to the The similarity between the target feature and the T auxiliary features determines the weights corresponding to the T samples; the evaluation indicators corresponding to the T samples are weighted by the neural predictor according to the weights corresponding to the T samples. Obtain the predictive index of the first hyperparameter combination.
- the number of hyperparameter samples supported by the neural predictor is T, and T is a positive integer; the processing unit is specifically used to: input T+1 connection parameter information into the neural predictor. Predictor; the T+1 connection parameter information includes T connection parameter information obtained after connecting each sample in the T samples and the corresponding evaluation index, and the first hyperparameter combination and the target prediction index mask connection After obtaining the connection parameter information, the target prediction index mask is used to characterize the unknown prediction index corresponding to the first hyperparameter combination; through the neural predictor, each of the input T+1 connection parameter information is Similarity matching between two connection parameter information is performed to obtain the similarity between each two connection parameter information; the neural predictor is used to calculate the similarity between each two connection parameter information in the T+1 connection parameter information. Determine a predictor for the first combination of hyperparameters.
- embodiments of the present application also provide a data processing system, including user equipment and execution equipment.
- User device used to send hyperparameter information to the execution device.
- Hyperparameter information is used to indicate the hyperparameter search space corresponding to the user task.
- the execution device is used to receive the hyperparameter information sent by the user device and sample multiple hyperparameter combinations from the hyperparameter search space.
- the execution device uses the first hyperparameter combination, the multiple samples included in the training set, and the evaluation indicators of the multiple samples as the input of the neural predictor, and determines the prediction indicator corresponding to the first hyperparameter combination through the neural predictor.
- the first hyperparameter group Combined into any one of multiple hyperparameter combinations to obtain multiple predictive indicators corresponding to multiple hyperparameter combinations.
- the execution device sends K hyperparameter combinations to the user device, where K is a positive integer; among them, the K prediction indicators corresponding to the K hyperparameter combinations are the K highest among the multiple prediction indicators.
- the user device is also used to receive K hyperparameter combinations sent by the execution device.
- the user device may perform evaluation of K hyperparameter combinations.
- the user device sends K evaluation indicators corresponding to K hyperparameter combinations to the execution device.
- the execution device is also configured to receive K evaluation indicators corresponding to the K hyperparameter combinations sent by the user equipment; use the K hyperparameter combinations as K samples, and combine the K samples and the corresponding The K evaluation indicators are added to the training set.
- embodiments of the present application provide a data processing device, including: a processor and a memory; the memory is used to store instructions, and when the device is running, the processor executes the instructions stored in the memory, so that the device The method provided by the first aspect or any design of the first aspect is executed. It should be noted that the memory can be integrated into the processor or independent of the processor.
- embodiments of the present application also provide a readable storage medium, which stores programs or instructions that, when run on a computer, cause any of the methods described in the above aspects to be executed. .
- embodiments of the present application also provide a computer program product containing a computer program or instructions, which when run on a computer, causes the computer to perform any of the methods described in the above aspects.
- the present application provides a chip system.
- the chip is connected to a memory and is used to read and execute the software program stored in the memory to implement the method of any design in any aspect.
- Figure 1 is a schematic diagram of an artificial intelligence subject framework applied in this application
- Figure 2 is a schematic structural diagram of a convolutional neural network provided by an embodiment of the present application.
- Figure 3 is a schematic diagram of a system architecture 300 provided by an embodiment of the present application.
- Figure 4 is a schematic structural diagram of a neural predictor provided by an embodiment of the present application.
- Figure 5A is a schematic flow chart of a data processing method provided by an embodiment of the present application.
- Figure 5B is a schematic flowchart of another data processing method provided by an embodiment of the present application.
- Figure 6 is a schematic diagram of the training process of a neural predictor provided by an embodiment of the present application.
- Figure 7 is a schematic flow chart of another data processing method provided by an embodiment of the present application.
- Figure 8A is a schematic diagram of the processing flow of a neural predictor provided by an embodiment of the present application.
- Figure 8B is a schematic diagram of the processing flow of another neural predictor provided by an embodiment of the present application.
- Figure 9A is a schematic diagram of the processing flow of yet another neural predictor provided by an embodiment of the present application.
- Figure 9B is a schematic diagram of the processing flow of yet another neural predictor provided by an embodiment of the present application.
- Figure 10A is a schematic diagram of the processing flow of yet another neural predictor provided by an embodiment of the present application.
- Figure 10B is a schematic diagram of the processing flow of yet another neural predictor provided by an embodiment of the present application.
- Figure 11 is a schematic structural diagram of a data processing device provided by an embodiment of the present application.
- Figure 12 is a schematic structural diagram of another data processing device provided by an embodiment of the present application.
- Figure 1 shows a schematic diagram of an artificial intelligence main framework.
- the main framework describes the overall workflow of the artificial intelligence system and is suitable for general needs in the field of artificial intelligence.
- Intelligent information chain reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has gone through the condensation process of "data-information-knowledge-wisdom".
- the "IT value chain” reflects the value that artificial intelligence brings to the information technology industry, from the underlying infrastructure of artificial intelligence and information (providing and processing technology implementation) to the systematic industrial ecological process.
- Infrastructure provides computing power support for artificial intelligence systems, enables communication with the external world, and supports it through basic platforms.
- computing power is provided by smart chips (hardware acceleration chips such as CPU, NPU, GPU, ASIC, FPGA, etc.);
- the basic platform includes distributed computing framework and network and other related platform guarantees and support, which can include cloud storage and Computing, interconnection networks, etc.
- sensors communicate with the outside world to obtain data, which are provided to smart chips in the distributed computing system provided by the basic platform for calculation.
- Data from the upper layer of the infrastructure is used to represent data sources in the field of artificial intelligence.
- the data involves graphics, images, voice, and text, as well as IoT data of traditional devices, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
- Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other methods.
- machine learning and deep learning can perform symbolic and formal intelligent information modeling, extraction, preprocessing, training, etc. on data.
- Reasoning refers to the process of simulating human intelligent reasoning in computers or intelligent systems, using formal information to perform machine thinking and problem solving based on reasoning control strategies. Typical functions are search and matching.
- Decision-making refers to the process of decision-making after intelligent information is reasoned, and usually provides functions such as classification, sorting, and prediction.
- some general capabilities can be formed based on the results of further data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, and image processing. identification, etc.
- Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of overall artificial intelligence solutions, productizing intelligent information decision-making and realizing practical applications. Its application fields mainly include: intelligent manufacturing, intelligent transportation, Smart home, smart medical care, smart security, autonomous driving, smart city, smart terminal, etc.
- the neural network used in the neural predictor involved in this application serves as an important node and is used to implement machine learning, deep learning, search, reasoning, decision-making, etc.
- the neural networks mentioned in this application can include various types, such as deep neural networks (DNN), convolutional neural networks (CNN), recurrent neural networks (RNN), residual network, neural network using transformer model or other neural networks, etc.
- DNN deep neural networks
- CNN convolutional neural networks
- RNN recurrent neural networks
- residual network neural network using transformer model or other neural networks, etc.
- the work of each layer in a deep neural network can be expressed mathematically To describe: From the physical level, the work of each layer in the deep neural network can be understood as completing the transformation from the input space to the output space (that is, the row space of the matrix to the columns) through five operations on the input space (a collection of input vectors). space), these five operations include: 1. Dimension raising/reducing; 2. Zoom in/out; 3. Rotation; 4. Translation; 5. "Bend”. Among them, the operations of 1, 2 and 3 are performed by Completed, the operation of 4 is completed by +b, and the operation of 5 is implemented by a(). The reason why the word "space” is used here is because the object to be classified is not a single thing, but a class of things.
- W is a weight vector, and each value in the vector represents the weight value of a neuron in the neural network of this layer.
- This vector W determines the spatial transformation from the input space to the output space described above, that is, the weight W of each layer controls how to transform the space.
- the purpose of training a neural network is to finally obtain the weight matrix of all layers of the trained neural network (a weight matrix formed by the vector W of many layers). Therefore, the training process of neural network is essentially to learn how to control spatial transformation, and more specifically, to learn the weight matrix.
- the neural network using the transformer model in this application can include several encoders.
- Each encoder can include an attention (self attention) layer and a feed forward layer (feed forward layer).
- the attention layer can use the Multi-Head self-Attention mechanism.
- the feedforward layer can use a feedforward neural network (FNN).
- FNN feedforward neural network
- Each neuron in a feedforward neural network is arranged hierarchically, and each neuron is only connected to the neuron of the previous layer. Receives the output of the previous layer and outputs it to the next layer. There is no feedback between layers.
- the encoder is used to convert the input corpus into feature vectors.
- the multi-head self-attention layer uses calculations between three matrices to calculate the data input to the encoder.
- the three matrices include query matrix Q (query), key matrix K (key) and value matrix V (value).
- the multi-head self-attention layer refers to the various interdependencies between the word at the current position and words at other positions in the sequence.
- the feedforward layer is a linear transformation layer that linearly transforms the representation of each word.
- CNN Convolutional neural networks
- the deep learning architecture refers to the algorithm of machine learning in different abstractions. Learning at multiple levels.
- CNN is a feed-forward artificial neural network. Each neuron in the feed-forward artificial neural network processes the data input into it.
- a convolutional neural network (CNN) 100 may include an input layer 110 , a convolutional/pooling layer 120 , where the pooling layer is optional, and a neural network layer 130 .
- the convolution layer/pooling layer 120 may include layers 121-126 as examples.
- layer 121 is a convolution layer
- layer 122 is a pooling layer
- layer 123 is a convolution layer
- 124 is a pooling layer
- 121 and 122 are convolution layers
- 123 is a pooling layer
- 124 and 125 are convolution layers
- 126 is Pooling layer.
- the output of the convolutional layer can be used as the input of the subsequent pooling layer, or can be used as the input of another convolutional layer to continue the convolution operation.
- the convolution layer 121 may include many convolution operators, and the convolution operators are also called convolution kernels.
- the convolution operator can essentially be a weight matrix, which is usually predefined. Taking image processing as an example, different weight matrices extract different features in the image. For example, one weight matrix is used to extract edge information of the image, another weight matrix is used to extract the specific color of the image, and another weight matrix is used to extract the image in the image. Blur unwanted noise.
- weight values in these weight matrices require a lot of training in practical applications.
- Each weight matrix formed by the weight values obtained through training can extract information from the input data, thereby helping the convolutional neural network 100 to make correct predictions.
- the initial convolutional layer (for example, 121) often extracts more The general characteristics of Features such as high-level semantics. Features with higher semantics are more suitable for the problem to be solved.
- the pooling layer can also be a multi-layer convolution layer followed by one or more pooling layers.
- the pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling the input image to obtain a smaller size image.
- the average pooling operator can calculate the average value of pixel values in an image within a specific range.
- the max pooling operator can take the pixel with the largest value in a specific range as the result of max pooling.
- the operators in the pooling layer should also be related to the size of the image.
- the size of the image output after processing by the pooling layer can be smaller than the size of the image input to the pooling layer.
- Each pixel in the image output by the pooling layer represents the average or maximum value of the corresponding sub-region of the image input to the pooling layer.
- the convolutional neural network 100 After being processed by the convolutional layer/pooling layer 120, the convolutional neural network 100 is not enough to output the required output information. Because as mentioned above, the convolutional layer/pooling layer 120 will only extract features and reduce the parameters brought by the input image. However, in order to generate the final output information (required class information or other related information), the convolutional neural network 100 needs to use the neural network layer 130 to generate an output or a set of required number of classes. Therefore, the neural network layer 130 may include multiple hidden layers (131, 132 to 13n as shown in Figure 2) and an output layer 140. The parameters included in the multiple hidden layers may be based on specific task types. Related training data are pre-trained. For example, the task type can include image recognition, image classification, image super-resolution reconstruction, etc.
- the output layer 140 After the multi-layer hidden layer in the neural network layer 130, that is, the last layer of the entire convolutional neural network 100 is the output layer 140.
- the output layer 140 has a loss function similar to classification cross entropy, specifically used to calculate the prediction error.
- the convolutional neural network 100 shown in Figure 2 is only an example of a convolutional neural network.
- the convolutional neural network can also exist in the form of other network models, for example, multiple The convolutional layers/pooling layers are parallel, and the extracted features are all input to the neural network layer 130 for processing.
- black-box optimization is performed through neural predictors.
- Black-box optimization can be used to find optimal operating parameters for a system, product, or process whose performance can be measured or evaluated as a function of these parameters.
- Black box optimization can also be understood as hyperparameter optimization, which is used to optimize hyperparameters.
- Hyper-parameters are parameters whose values are set before starting the learning process and are parameters obtained without training. Hyperparameters can be understood as operating parameters of a system, product or process.
- hyperparameter tuning of neural networks can be understood as a black-box optimization problem. The various neural networks currently used are trained through data and a certain learning algorithm to obtain a model that can be used for prediction and estimation.
- hyperparameters parameters such as the learning rate in the algorithm or the number of samples in each batch that are not obtained through training are generally called hyperparameters.
- the array set can contain the values of all or part of the hyperparameters of the neural network.
- the weight of each neuron will be optimized with the value of the loss function to reduce the value of the loss function. In this way, the parameters can be optimized through algorithms to obtain the model.
- the hyperparameters are used to adjust the entire network training process, such as the number of hidden layers of the aforementioned convolutional neural network, the size or number of kernel functions, etc. Hyperparameters are not directly involved in the training process, but only serve as configuration variables.
- Bayesian optimization can be employed for black-box optimization.
- Bayesian optimization is based on Gaussian models. Model the objective function of the position based on known samples to obtain the mean function and the confidence of the mean function. For a certain point, the larger the confidence range, the lower the uncertainty of the modeling of the point, that is, the greater the probability that the true value deviates from the predicted value of the mean at that point. Bayesian optimization decides which point to try to model next based on the mean and confidence. Bayesian optimization methods generally make continuity assumptions for the target problem. For example, the larger the hyperparameter value, the larger the prediction result. However, if the target problem does not comply with this continuity assumption, the modeling effect of Bayesian optimization will be poor and the sampling efficiency will be reduced.
- the neural predictor uses a neural network. Compared with Bayesian optimization, the neural network is used instead of the Gaussian model to model the target problem. However, the neural network method requires more training data to train a generalized neural predictor. In a black-box optimization scenario, the cost of a single evaluation is large, so the number of training samples obtained is small, resulting in poor generalization of the neural predictor obtained by training, resulting in the hyperparameters searched not being the hyperparameters with the optimal evaluation results. parameter.
- embodiments of the present application provide a data processing method that combines training samples to assist in the prediction of target hyperparameter combinations. Since the evaluation results corresponding to the hyperparameter combinations in the training samples have been verified by users, the accuracy is high.
- the prediction of the target hyperparameter combination uses the assistance of user-verified training samples. Compared with the assistance of no user-verified training samples, the solution adopted in the embodiment of the present application predicts the target hyperparameter combination with higher accuracy. Furthermore, in order to obtain a neural predictor with better generalization, compared with the auxiliary solution that does not use user-verified training samples, the solution adopted in the embodiment of the present application uses fewer training samples to obtain a neural predictor with better generalization. predictor.
- the embodiments of this application can be used for hyperparameter optimization of various complex systems. Scenarios where the embodiments of this application can be applied may include molecular design, materials science, factory debugging, chip design, neural network structure design, neural network training and tuning, etc.
- tunable parameters e.g., compositions or types of ingredients
- a physical product or a process for producing a physical product such as, for example, an alloy, a metamaterial, a concrete mixture, a process of pouring concrete, a pharmaceutical mixture, or a process of performing a treatment. or quantity, production sequence, production timing).
- design parameters used to optimize the neural network structure such as the number of convolution layers, convolution kernel size, expansion size, and the position of the rectified linear unit (ReLU).
- the data processing method provided by the embodiment of the present application can be executed by an execution device.
- An execution device may be implemented by one or more computing devices.
- Figure 3 shows a system architecture 300 provided by an embodiment of the present application. Included in the system architecture 300 is an execution device 210 .
- Execution device 210 may be implemented by one or more computing devices. Execution device 210 may be arranged on one physical site, or distributed across multiple physical sites.
- System architecture 300 also includes data storage system 250 .
- the execution device 210 cooperates with other computing devices, such as data storage, routers, load balancers and other devices.
- the execution device 210 can use the data in the data storage system 250, or call the program code in the data storage system 250 to implement the data processing method provided by this application.
- One or more computing devices can be deployed in a cloud network.
- the data processing method provided by the embodiment of the present application is deployed in one or more computing devices of the cloud network in the form of a service, and the user device accesses the cloud service through the network.
- the data processing method provided by the embodiment of the present application can be deployed on one or more local computing devices in the form of a software tool.
- Each local device can represent any computing device, such as a personal computer, computer workstation, smartphone Mobile phones, tablets, smart cameras, smart cars or other types of cellular phones, media consumption devices, wearables, set-top boxes, game consoles, etc.
- Each user's local device can interact with the execution device 210 through a communication network of any communication mechanism/communication standard.
- the communication network can be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof.
- execution device 210 may be implemented by each local device, for example, local device 301 may provide local data or feedback evaluation results to execution device 210 .
- the execution device 210 can also be implemented by local devices.
- the local device 301 implements the functions of the execution device 210 and provides services for its own users, or provides services for users of the local device 302 .
- FIG 4 is a schematic structural diagram of a neural predictor provided by an embodiment of the present application.
- the inputs of the neural predictor are multiple samples in the training set (which can also be called hyperparameter samples or auxiliary hyperparameter samples), the evaluation indicators corresponding to the multiple samples, and the target hyperparameter combination that needs to be predicted.
- the output of the neural predictor is a predictor of the target hyperparameter combination that needs to be predicted.
- Multiple auxiliary hyperparameter samples are used to assist the neural predictor in predicting the predictive index of the target hyperparameter combination that needs to be predicted.
- FIG. 5A is a schematic flow chart of a data processing method provided by this application.
- the method may be executed by an execution device, such as the execution device 210 in FIG. 3 .
- the hyperparameter search space includes the hyperparameters required for the user's task. Hyperparameters can be sampled from the hyperparameter search space to obtain multiple hyperparameter values as a hyperparameter combination. It should be understood that a hyperparameter combination may include one or more hyperparameter values.
- obtaining multiple hyperparameter combinations may receive hyperparameter information from a user device.
- Hyperparameter information is used to indicate the hyperparameter search space corresponding to the user task.
- multiple hyperparameter combinations can be obtained by sampling from the hyperparameter search space.
- the user device may send the hyperparameter information to the execution device 210 by calling a service.
- the hyperparameter search space may include a variety of hyperparameters required for user tasks, and the value of each hyperparameter may be a continuously distributed value or a discrete distributed value.
- the hyperparameter search space may include the value range of hyperparameter A as [1, 20], and the value range of hyperparameter B may include: 2, 3, 6, 7, etc. Therefore, when sampling in the hyperparameter search space, you can take any value from a continuous distribution or a value from a discrete distribution to get a set of hyperparameter combinations.
- the following steps take prediction for a hyperparameter combination as an example. For example, taking the prediction of the first hyperparameter combination as an example, if the first hyperparameter combination is any one of multiple hyperparameter combinations, then the prediction of each hyperparameter combination can predict the first hyperparameter combination.
- auxiliary hyperparameter samples Use the first hyperparameter combination, multiple hyperparameter samples in the training set, and evaluation indicators of the multiple hyperparameter samples as inputs to the neural predictor, and determine the prediction indicators corresponding to the first hyperparameter combination through the neural predictor.
- multiple hyperparameter samples are called auxiliary hyperparameter samples.
- auxiliary hyperparameter samples are used to assist the neural predictor in predicting the prediction index of the first hyperparameter combination.
- the auxiliary hyperparameter samples used to assist the neural predictor in predicting the first hyperparameter combination can also be called "support samples" or other names, which are not specifically limited in the embodiments of this application.
- auxiliary hyperparameter sample can be understood as a hyperparameter combination.
- the hyperparameters corresponding to the auxiliary hyperparameter samples of the training set are combined into auxiliary hyperparameter combinations.
- This auxiliary hyperparameter combination is also sampled from the hyperparameter search space.
- Multiple hyperparameter combinations are not the same as multiple auxiliary hyperparameter combinations.
- the evaluation index corresponding to the auxiliary hyperparameter combination can be obtained by evaluating the auxiliary hyperparameter combination through user tasks.
- the evaluation indicators corresponding to each hyperparameter sample included in the training set can also be evaluated in other ways.
- the index results of the hyperparameter combination predicted by the neural predictor are called prediction indicators, and the index results of the hyperparameter combination obtained by user task evaluation are called evaluation indicators.
- the execution device may perform steps 203 and 204 .
- the execution device can determine K hyperparameter combinations from multiple hyperparameter combinations. Specifically, K hyperparameter combinations with optimal (or highest) predictive indicators are obtained from multiple hyperparameter combinations. It can be understood that the predictive indicators corresponding to the K hyperparameter combinations are all higher than the multiple hyperparameter combinations.
- the prediction index corresponding to any predicted hyperparameter combination except the K hyperparameter combinations mentioned above, K is a positive integer.
- the execution device can send K hyperparameter combinations to the user device.
- the embodiment of the present application can also update the training set. Determining which hyperparameter combinations to update into the training set can be based on the results of multiple iterative evaluations for multiple hyperparameter combinations.
- the user device can trigger the user task.
- Users can perform specific user tasks based on the received hyperparameter combinations, such as video classification, text recognition, image beautification, speech recognition and other tasks.
- User tasks can be evaluated separately for the K hyperparameter combinations, and the evaluation indicators of the K hyperparameter combinations are sent to the execution device.
- the execution device adds K hyperparameter combinations and corresponding evaluation indicators as auxiliary hyperparameter samples to the training set.
- the function of the execution device is implemented by one or more computing devices deployed on the cloud network as an example. That is, the above-mentioned actions of sending K hyperparameter combinations and receiving evaluation results are between one or more computing devices and user devices in the cloud network. In some scenarios, when the data processing method is deployed on one or more local computing devices, the above actions of sending K hyperparameter combinations and receiving evaluation results can be between different components of the computing device, or between different computing devices. , or it can be to obtain the evaluation results of K hyperparameter combinations from the storage space of the computing device when the software program of the computing device is executed.
- a component in a local computing device for executing an iterative sampling process sends K hyperparameter combinations to a component for evaluating hyperparameter combinations.
- the component that evaluates the hyperparameter combination then performs the evaluation operation and sends the evaluated evaluation metrics to the component that performs the iterative sampling process.
- steps 201 to 204 are called iterative sampling processes.
- the execution device may execute multiple rounds of the iterative sampling process including steps 201-204.
- the execution device After the execution device adds K hyperparameter combinations and corresponding evaluation indicators as auxiliary hyperparameter samples to the training set, the updated training set can be used for the next round of iterative sampling process.
- the iterative sampling stop condition may include at least one of the following:
- N is an integer greater than 1. After N rounds of iterative sampling are performed, the iterative sampling process is stopped.
- the optimal evaluation index in the training set has not changed during the M consecutive rounds of iterative sampling.
- the optimal evaluation metrics included in the training set can be recorded. For example, after the i-th round of iterative sampling, the training set The optimal evaluation index included is A. After the i+1 iterative sampling round, the optimal evaluation index included in the training set is still A, and so on. After the i+M-1 iterative sampling round, the training set includes the optimal The evaluation index is still A. From the i-th round of iterative sampling to the i+M-1 iterative sampling round, the optimal evaluation indicators included in the training set have not changed. Therefore, the next round of iterative sampling process will not be executed. Optionally, when the number of rounds of iterative sampling reaches the set maximum number of sampling rounds and condition (2) is not met, the next round of iterative sampling process will not be executed.
- the optimal evaluation index in the training set reaches the set index. Specifically, after multiple rounds of iterative sampling processes, the optimal evaluation index in the training set reaches the set indicator, so the next round of iterative sampling processes is no longer performed. Optionally, when the number of iterative sampling rounds reaches the set maximum number of sampling rounds, the optimal evaluation index in the training set has not yet reached the set index, and the next round of iterative sampling process will not be executed.
- the neural predictor in the embodiment of this application is trained using the training set. As shown in Figure 6, in each iteration of the training process, a sample is selected from the training set as the target sample.
- the target sample can be understood as a target hyperparameter combination in the training set.
- the training set also includes evaluation metrics corresponding to the target hyperparameter combination. In order to facilitate the distinction, the target hyperparameter combination and the evaluation index corresponding to the target hyperparameter combination are called target sample combination ⁇ target hyperparameter combination, evaluation index ⁇ .
- the input of the neural predictor also needs to select multiple hyperparameter samples from the training set as auxiliary hyperparameter samples.
- the training set also includes evaluation indicators corresponding to auxiliary hyperparameter samples.
- auxiliary hyperparameter samples ie, auxiliary hyperparameter combinations
- auxiliary hyperparameter combinations auxiliary hyperparameter combinations
- evaluation indicators auxiliary hyperparameter combinations
- the multiple auxiliary hyperparameter samples selected from the training set are different from the target samples.
- the number of input auxiliary hyperparameter samples is T as an example.
- the T auxiliary sample combinations and the target hyperparameter combination are input to the neural predictor, and the output of the neural predictor is the prediction indicator of the target hyperparameter combination.
- the prediction index of the target hyperparameter combination is compared with the corresponding evaluation index of the target hyperparameter combination, the loss value is calculated, and then the weight of the neural predictor is updated based on the loss value.
- a loss function can be used when calculating the loss value.
- the loss function is also the objective function in the weight optimization process. Generally, the smaller the value of the loss function, the more accurate the output of the neural predictor.
- the training process of neural predictors can be understood as the process of minimizing the loss function. Commonly used loss functions can include logarithmic loss function, square loss function, exponential loss function, etc.
- optimization algorithms such as gradient descent algorithm, stochastic gradient descent algorithm, or momentum gradient descent algorithm (adaptive moment estimation, Adam) can be used to optimize the weight.
- the training of the neural predictor can be performed in advance, or can be performed in the iterative sampling process of each round.
- the neural predictor is first trained through multiple iterative trainings, or each time after the training set is updated, the neural predictor is trained, and then based on the neural predictor trained in this round of sampling process to predict multiple hyperparameter combinations.
- the iterative evaluation process is interleaved with the iterative training process.
- FIG. 7 is a schematic flow chart of another data processing method provided by an embodiment of the present application.
- Figure 7 takes as an example the intersection of the iterative evaluation process and the iterative training process in each round of iterative sampling process. Take, for example, updating K hyperparameter samples in the training set in each round of sampling process. The maximum number of sampling rounds is N.
- the initial training set is empty.
- User task execution evaluation can be triggered to obtain evaluation indicators.
- the evaluation process and evaluation results of user tasks are not specifically limited in the embodiments of this application.
- User task assessment can be manual or through user device assessment.
- the data processing method is deployed on one or more computing devices of the cloud network as an example. That is, the above-mentioned actions of sending K hyperparameter combinations and receiving evaluation results are between one or more computing devices and user devices in the cloud network. In some scenarios, when the data processing method is deployed on one or more local computing devices, the above actions of sending K hyperparameter combinations and receiving evaluation results can be between different components of the computing device, or between different computing devices. It may also be that the evaluation indicators of K hyperparameter combinations are obtained from the storage space of the computing device when the software program is executed.
- step 403 can be replaced by: the component for executing the iterative sampling process in the local computing device sends K hyperparameter combinations to the component for evaluating hyperparameter combinations, and then step 404 can be replaced by: used for evaluating hyperparameters
- the combined components perform the evaluation operation and send the evaluated evaluation metrics to the component that performs the iterative sampling process.
- L is greater than K.
- the iterative sampling stop condition is that the number of rounds of iterative sampling reaches the maximum number of sampling rounds as an example.
- the above-mentioned training of the neural predictor and the prediction of L hyperparameter combinations are performed in each round of iterative sampling process.
- the number of epochs for training the neural predictor may be smaller than the number of epochs for iterative sampling.
- the prediction of the training neural predictor and L hyperparameter combinations is performed in the first a-round iterative sampling process in the N-round iterative sampling process.
- the training of the neural predictor is no longer performed, but only the prediction of L hyperparameter combinations is performed.
- FIG. 8A is a schematic diagram of the processing flow of a neural predictor provided by an embodiment of the present application.
- Select multiple hyperparameter samples from the training set as auxiliary hyperparameter samples.
- the training set also includes evaluation indicators corresponding to auxiliary hyperparameter samples.
- the input of the neural predictor includes T auxiliary samples and the evaluation indicators corresponding to the T auxiliary samples and the target hyperparameter combination.
- Figure 8A refers to the T auxiliary hyperparameter samples as auxiliary hyperparameter samples 1 to N, and the evaluation indicators corresponding to the auxiliary superparameter samples 1 to N are called evaluation indicators 1 to N; that is, ⁇ auxiliary superparameter sample 1, evaluation index 1 ⁇ ,..., ⁇ auxiliary hyperparameter sample T, evaluation index T ⁇ .
- the neural predictor jointly encodes T auxiliary hyperparameter samples and target hyperparameter combinations to obtain T+1 features.
- the encoded features corresponding to the T auxiliary hyperparameter samples are called auxiliary features
- the encoded features corresponding to the target hyperparameter combination are called target features.
- the neural predictor includes at least one encoder layer.
- two coding layers are taken as an example.
- the encoding layer can use the encoding module in the transformer structure.
- the encoding layer is composed of an attention layer (Attention Layer) and a feed-forward layer (Feed-forward layer).
- the attention layer is used to perform similarity on T+1 hyperparameter combinations (including the hyperparameter combinations corresponding to the auxiliary hyperparameter samples and the target hyperparameter combination) obtained by combining the T auxiliary hyperparameter samples and the target hyperparameter combination.
- the similarity matrix is obtained by calculation, and then T+1 hyperparameter combinations are weighted according to the similarity matrix to obtain T+1 features.
- the T+1 features are sent to the feedforward layer for feature transformation, and finally the encoding layer Output T+1 encoded features. Fusion coding is performed on T+1 hyperparameter combinations through at least one coding layer.
- the neural predictor determines the similarity between the target feature and the T auxiliary features, and then determines the weights corresponding to the T auxiliary hyperparameter samples based on the similarities between the target feature and the T auxiliary features.
- the neural predictor weights the evaluation indicators included in the T auxiliary hyperparameter samples according to the corresponding weights of the T auxiliary hyperparameter samples to obtain the prediction index of the target hyperparameter combination.
- the neural predictor performs inner product processing on the target feature and the T auxiliary features to obtain the similarities corresponding to the target feature and the T auxiliary features. Then, the neural predictor uses the softmax function to convert the similarities corresponding to the target feature and the T auxiliary features into weights corresponding to the T auxiliary hyperparameter samples.
- the processing flow of the neural predictor shown in Figures 8A-8B is applicable to both the training flow and the evaluation flow. If in the training process, the target hyperparameter combination also comes from the training set. Furthermore, the comparison result between the prediction index corresponding to a target hyperparameter combination output by the neural predictor and the evaluation index corresponding to the target hyperparameter combination in the training set is used to adjust the weight of the neural predictor.
- FIG. 9A is a schematic diagram of the processing flow of another neural predictor provided by an embodiment of the present application.
- the input of the neural predictor includes T auxiliary hyperparameter samples, the evaluation indicators corresponding to the T auxiliary hyperparameter samples, and the target hyperparameter combination.
- Figure 9A refers to the T auxiliary hyperparameter samples as auxiliary hyperparameter samples 1 to N, and the evaluation indicators corresponding to the auxiliary superparameter samples 1 to N are called evaluation indicators 1 to N; that is, ⁇ auxiliary superparameter sample 1, evaluation index 1 ⁇ ,..., ⁇ auxiliary hyperparameter sample T, evaluation index T ⁇ .
- the neural predictor determines the prediction index corresponding to the target hyperparameter combination based on the target hyperparameter combination, the auxiliary hyperparameter samples 1 to N, the evaluation indicators 1 to N corresponding to the auxiliary hyperparameter samples 1 to N, and the two anchor point features. .
- the two anchor features are used to calibrate the coding features of the lowest predictive index and the coding features of the highest predictive index of the target task.
- the neural predictor when determining the predictive index corresponding to the target hyperparameter combination, can jointly encode T auxiliary hyperparameter samples and the target hyperparameter combination to obtain T+1 features.
- the encoded features corresponding to the T auxiliary hyperparameter samples are called auxiliary features
- the encoded features corresponding to the target hyperparameter combination are called target features.
- the neural predictor includes at least one encoding layer (encoder layer). The neural predictor combines T+1 hyperparameters (composed of T auxiliary hyperparameter samples and target hyperparameters) through the encoding layer. obtained by combining) for joint encoding. The specific method can be seen in Figure 8B, which will not be described again here.
- the neural predictor determines the similarity between the target feature and the T auxiliary features and the similarity between the target feature and the two anchor features respectively.
- anchor feature 1 the anchor feature of the coding feature with the lowest predictive index
- anchor feature 2 the anchor feature of the coding feature with the highest predictive index
- the inner product method can be used when determining the similarity.
- the neural predictor determines the weights corresponding to the T auxiliary hyperparameter samples and the two anchor point features based on the similarity between the target feature and the T auxiliary features and the two anchor point features.
- the evaluation indicators included in the T auxiliary hyperparameter samples and the prediction indicators corresponding to the two anchor points are weighted according to the weights corresponding to the T auxiliary hyperparameter samples and the two anchor points to obtain the target hyperparameter combination output by the neural predictor.
- predictive indicators For example, as shown in FIG. 9B , the prediction index corresponding to anchor feature 1 can be configured as 0, and the prediction index corresponding to anchor feature 2 can be configured as 1.
- the Softmax function can be used.
- two anchor point features are learnable, and the two anchor point features can be understood as learnable parameters of the neural predictor.
- the two anchor features can be updated simultaneously every time the weights of the neural predictor are updated.
- the processing flow of the neural predictor shown in Figures 9A-9B is applicable to both the training flow and the evaluation flow. If in the training process, the target hyperparameter combination also comes from the training set. Then, the comparison result between the prediction index corresponding to a target hyperparameter combination output by the neural predictor and the evaluation index corresponding to the target hyperparameter combination in the hyperparameter sample is used to adjust the weight of the neural predictor and the two anchor point features. .
- FIG. 10A is a schematic diagram of the processing flow of another neural predictor provided by an embodiment of the present application.
- the input of the neural predictor includes auxiliary hyperparameter samples, evaluation indicators corresponding to T auxiliary hyperparameter samples, target hyperparameter combination, and target prediction indicator mask corresponding to the target hyperparameter combination.
- Figure 10A refers to the T auxiliary hyperparameter samples as auxiliary hyperparameter samples 1 to N, and the evaluation indicators corresponding to the auxiliary superparameter samples 1 to N are called evaluation indicators 1 to N; that is, ⁇ auxiliary superparameter sample 1, evaluation index 1 ⁇ ,..., ⁇ auxiliary hyperparameter sample T, evaluation index T ⁇ .
- the auxiliary hyperparameter sample and the corresponding evaluation index can be connected to obtain the connection parameter information, and then the connection parameter information is input into the neural predictor.
- evaluation index 1 ⁇ the connection parameter information 1 is obtained by connecting the auxiliary hyperparameter sample 1 and the evaluation index 1.
- Target connection parameter information is obtained for the target hyperparameter combination and target prediction indicator mask connection.
- the target prediction index mask is used to characterize the unknown prediction index corresponding to the target hyperparameter combination.
- the target predictor mask is learnable. After initial configuration, when training a neural predictor, this target predictor mask can be updated each time the neural predictor's weights are updated.
- the neural predictor performs similarity matching on each two connection parameter information among the input T+1 connection parameter information to obtain the similarity between each two connection parameter information. Further, the neural predictor determines the prediction index corresponding to the target hyperparameter combination based on the similarity between each two of the T+1 connection parameter information.
- the neural predictor includes multiple coding layers.
- two coding layers are taken as an example.
- Neural predictors also include FC/sigmoid layers.
- the coding layer can be the standard coding layer in the Transformer structure, which is composed of an attention layer (Attention Layer) and a feed-forward layer (Feed-forward layer).
- the attention layer is used to calculate the similarity of the input T+1 connection parameter information in pairs to obtain a similarity matrix, and then weight the T+1 connection information according to the similarity matrix to obtain T+1 features.
- the T+1 features are sent to the feed-forward layer, and the feed-forward layer performs feature transformation on the T+1 features.
- T+1 connection parameter information can be fused to comprehensively predict target prediction indicators.
- the features corresponding to the target connection parameter information in the T+1 features output by the coding layer are input to the FC/sigmoid layer.
- the neural predictor reduces the dimensionality of the features corresponding to the target connection parameter information through the FC/sigmoid layer to obtain 1-dimensional features. This feature is normalized by the Sigmoid function to a value between 0 and 1, which is the target prediction index corresponding to the predicted target hyperparameter combination.
- the target predictor mask is learnable, and the target predictor mask can be understood as a learnable parameter of the neural predictor.
- the target predictor mask can be updated simultaneously with each update of the neural predictor's weights.
- the processing flow of the neural predictor shown in Figures 10A-10B is applicable to both the training flow and the evaluation flow. If in the training process, the target hyperparameter combination also comes from the training set. Furthermore, the comparison result between the prediction index corresponding to a target hyperparameter combination output by the neural predictor and the evaluation index corresponding to the target hyperparameter combination in the hyperparameter sample is used to adjust the weight of the neural predictor and the target prediction index mask.
- the hyperparameter search space is defined as follows, where the three numerical values represent the minimum value, maximum value and step size of the hyperparameter respectively.
- wd numerical type (0.02, 0.4, 0.01), indicating weight attenuation
- dropout numerical type (0.0, 0.3, 0.025), indicating the probability of dropout
- drop_conn_rate numerical type (0.0, 0.4, 0.025), indicating the probability of drop connection
- mixup numerical type (0.0, 1.0, 0.05), indicating the distribution parameters of mixup;
- color numerical type (0.0, 0.5, 0.025), indicating the intensity of color data enhancement
- re_prob Numeric type (0.0, 0.4, 0.025), indicating the probability of random erase.
- hyperparameter space is just an example. In actual applications, any hyperparameters that need to be optimized can be defined.
- the optimizer refers to the parameters used to optimize the machine learning algorithm, such as network weights. Optimization algorithms such as gradient descent, stochastic gradient descent, or momentum gradient descent algorithm (adaptive moment estimation, Adam) can be used for parameter optimization.
- the learning rate refers to the magnitude of updating parameters in each iteration of the optimization algorithm, also called the step size. When the step size is too large, the algorithm will not converge and the objective function of the model will be in a state of oscillation, while when the step size is too small, the convergence speed of the model will be too slow.
- A1 initialize the neural predictor and execute A2. Understandably, the initial training set is empty.
- A2 sample 16 hyperparameter combinations from the hyperparameter search space, and execute A3.
- A4 update the 16 hyperparameter combinations and the evaluation indicators corresponding to the 16 hyperparameter combinations into the training set as hyperparameter samples.
- A5 perform multiple iterations of training on the neural predictor based on the training set to obtain the neural predictor obtained in the i-th round of sampling process.
- the number of rounds of iterative training is not specifically limited in the embodiment of this application.
- For the training process please refer to the description of the corresponding embodiment in Figure 3 and will not be described again here.
- A6 sample 1000 hyperparameter combinations from the hyperparameter search space. It can be understood that these 1000 hyperparameter combinations are different from the previously sampled hyperparameter combinations.
- the iterative sampling stop condition is that the number of rounds of iterative sampling reaches the maximum number of sampling rounds as an example.
- A9 select the 16 hyperparameter combinations with the best prediction indicators from 1000 hyperparameter combinations, and continue to execute A3.
- the solution provided by the embodiment of the present application reaches the level of Bayesian optimization, the number of samples of the embodiment of the present application is lower than the number of samples of Bayesian optimization (that is, the number of manually confirmed prediction indicators).
- the solution provided by the embodiments of this application is used to combine training samples to assist in the prediction of target hyperparameter combinations. Since the evaluation results corresponding to the hyperparameter combinations in the training samples have been verified by users, the accuracy is high.
- the prediction of the target hyperparameter combination is assisted by user-verified training samples. Compared with the existing ordinary predictors whose input only includes the target hyperparameter combination, the scheme adopted in the embodiment of the present application predicts the target hyperparameter combination. The prediction results are more accurate.
- the input of existing neural predictors only includes target evaluation samples, and there are no other reference samples and evaluation indicators. It is necessary to obtain evaluation indicators of many real samples in advance to train the neural predictor.
- the input of the neural predictor includes hyperparameter samples and evaluation indicators that already have evaluation indicators.
- the evaluation indicators of the hyperparameter sample have been referred to, so that the accuracy of predicting the prediction indicators of the target sample is This improves the accuracy of adjusting the weight of the neural predictor based on the accuracy of the prediction index, thereby reducing the number of training rounds and thus reducing the number of training samples used.
- the solution adopted in the embodiment of this application uses fewer training samples to obtain a neural predictor with better generalization.
- the embodiment of the present application also provides a data processing device.
- the device may be a processor, a chip or a chip system in the execution device, or a module in the execution device.
- the device may include a receiving unit 1101, a processing unit 1102 and a sending unit 1103.
- the receiving unit 1101, the processing unit 1102, and the sending unit 1103 are configured to perform the method steps shown in the embodiments corresponding to FIG. 5A and FIG. 7 .
- the receiving unit 1101 is configured to receive hyperparameter information sent by the user equipment, where the hyperparameter information is used to indicate the hyperparameter search space corresponding to the user task.
- the processing unit 1102 is configured to sample multiple hyperparameter combinations from the hyperparameter search space; use the first hyperparameter combination, multiple samples included in the training set, and the evaluation indicators of the multiple samples as inputs to the neural predictor, and use the neural predictor Determine the predictive index corresponding to the first hyperparameter combination, and the first hyperparameter combination is any one of multiple hyperparameter combinations, so as to obtain multiple predictive indicators corresponding to the multiple hyperparameter combinations.
- the sending unit 1103 is configured to send K hyperparameter combinations to the user equipment, where K is a positive integer; where the K prediction indicators corresponding to the K hyperparameter combinations are the K highest ones among the multiple prediction indicators.
- the receiving unit 1101 is also configured to receive K evaluation indicators corresponding to the K hyperparameter combinations sent by the user equipment.
- the processing unit 1102 is also used to combine the K hyperparameters into K samples, and add the K samples and the corresponding K evaluation indicators to the training set.
- the processing unit 1102 is also used to train the neural predictor by selecting multiple samples from the training set, evaluation indicators corresponding to the multiple samples, and selecting a target sample from the training set. Multiple samples, evaluation indicators corresponding to the multiple samples, and target samples are used as inputs to the neural predictor, and the prediction indicators corresponding to the target samples are determined through the neural predictor. Adjust the network parameters of the neural predictor according to the comparison results between the prediction indicators of the target sample and the evaluation indicators corresponding to the target sample.
- the processing unit 1102 is specifically configured to input the first hyperparameter combination, multiple samples included in the training set, and evaluation indicators of the multiple samples into the neural predictor.
- the neural predictor determines the prediction index corresponding to the first hyperparameter combination based on the first hyperparameter combination, multiple samples, evaluation indicators of the multiple samples, and two anchor point features.
- two anchor point features are used to calibrate the coding features of the lowest predictive index and the coding features of the highest predictive index of the user task.
- the number of input samples supported by the neural predictor is T, and T is a positive integer; processing Unit 1102 is specifically used for: the neural predictor encodes the input T samples to obtain T auxiliary features, and encodes the first hyperparameter combination to obtain the target feature; the neural predictor determines the similarity between the target feature and the T auxiliary features.
- the neural predictor determines T+2 weights based on the similarity between the target feature and T auxiliary features and the two anchor point features, and the T+2 weights include T The weights of samples and the weights of two anchor point features; the neural predictor weights the T+2 evaluation indicators according to the T+2 weights to obtain the prediction indicators of the first hyperparameter combination; among them, the T+2 evaluation indicators include The evaluation indicators of T samples and the evaluation indicators corresponding to the two anchor point features.
- the two anchor features belong to the network parameters of the neural predictor.
- the number of input samples supported by the neural predictor is T; the processing unit 1102 is specifically configured to: encode the input T samples through the neural predictor to obtain T auxiliary features, and then encode the first super
- the combination of parameters is encoded to obtain the target features.
- the similarity between the target feature and T auxiliary features is determined through the neural predictor.
- the neural predictor is used to determine the weights corresponding to the T samples based on the similarities between the target feature and the T auxiliary features.
- the neural predictor weights the evaluation indicators corresponding to the T samples according to the weights corresponding to the T samples to obtain the prediction indicators of the first hyperparameter combination.
- the number of hyperparameter samples supported by the neural predictor is T; the processing unit 1102 is specifically used to: input T+1 connection parameter information into the neural predictor; T+1 connection parameters
- the information includes the connection parameter information obtained by connecting each sample in the T samples and the corresponding evaluation index, and the connection parameter information obtained by connecting the first hyperparameter combination and the target prediction index mask.
- the target prediction index mask is used to characterize The unknown predictor corresponding to the first hyperparameter combination.
- the neural predictor is used to perform similarity matching on each two connection parameter information among the input T+1 connection parameter information to obtain the similarity between each two connection parameter information.
- the neural predictor determines the prediction index of the first hyperparameter combination based on the similarity between each two connection parameter information in the T+1 connection parameter information.
- the device 1200 may include a communication interface 1210 and a processor 1220.
- the device 1200 may also include a memory 1230.
- the memory 1230 may be provided inside the device or outside the device.
- the receiving unit 1101, the processing unit 1102 and the sending unit 1103 shown in FIG. 11 can all be implemented by the processor 1220.
- the functions of the receiving unit 1101 and the sending unit 1103 are implemented by the communication interface 1210.
- the functions of the processing unit 1102 are implemented by the processor 1220.
- the processor 1220 receives the hyperparameter information through the communication interface 1210 and sends the hyperparameter combination, and is used to implement the methods described in FIG. 5A and FIG. 7 .
- each step of the processing flow can complete the methods described in Figures 5A and 7 through instructions in the form of hardware integrated logic circuits or software in the processor 1220.
- the communication interface 1210 may be a circuit, a bus, a transceiver, or any other device that can be used for information exchange.
- the other device may be a device connected to the device 1200.
- the processor 1220 can be a general processor, a digital signal processor, an application-specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component, and can implement or execute The disclosed methods, steps and logical block diagrams in the embodiments of this application.
- a general-purpose processor may be a microprocessor or any conventional processor, etc.
- the steps of the methods disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware processor for execution, or can be executed by a combination of hardware and software units in the processor.
- the program code executed by the processor 1220 to implement the above method may be stored in the memory 1230 . Memory 1230 and processor 1220 are coupled.
- the coupling in the embodiment of this application is an indirect coupling or communication connection between devices, units or modules, which may be in electrical, mechanical or other forms, and is used for information interaction between devices, units or modules.
- the processor 1220 may cooperate with the memory 1230.
- the memory 1230 may be a non-volatile memory, such as a hard disk drive (HDD) or a solid-state drive (SSD), or a volatile memory (volatile memory), such as a random access memory (random access memory). -access memory, RAM).
- Memory 1230 is, but is not limited to, any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
- the embodiment of the present application does not limit the specific connection medium between the communication interface 1210, the processor 1220, and the memory 1230.
- the memory 1230, the processor 1220 and the communication interface 1210 are connected through a bus in Figure 12.
- the bus is represented by a thick line in Figure 12.
- the connection methods between other components are only schematically explained. It is not limited.
- the bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one thick line is used in Figure 12, but it does not mean that there is only one bus or one type of bus.
- embodiments of the present application also provide a computer storage medium, which stores a software program.
- the software program can implement any one or more of the above. Examples provide methods.
- the computer storage medium may include: U disk, mobile hard disk, read-only memory, random access memory, magnetic disk or optical disk and other various media that can store program codes.
- embodiments of the present application also provide a chip, which includes a processor and is used to implement the functions involved in any one or more of the above embodiments, such as obtaining or processing the information involved in the above methods or information.
- the chip further includes a memory, and the memory is used for necessary program instructions and data executed by the processor.
- the chip may be composed of chips or may include chips and other discrete devices.
- One embodiment of the present application provides a computer-readable medium for storing a computer program.
- the computer program includes instructions for executing the method steps in the method embodiment corresponding to FIG. 4 .
- embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment that combines software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, etc.) having computer-usable program code embodied therein.
- a computer-usable storage media including, but not limited to, disk storage, optical storage, etc.
- This application refers to flowcharts of methods, devices (systems), and computer program products according to embodiments of the application. and/or block diagrams. It will be understood that each process and/or block in the flowchart illustrations and/or block diagrams, and combinations of processes and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine, such that the instructions executed by the processor of the computer or other programmable data processing device produce a use A device for realizing the functions specified in one process or multiple processes of the flowchart and/or one block or multiple blocks of the block diagram.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- Development Economics (AREA)
- Entrepreneurship & Innovation (AREA)
- Game Theory and Decision Science (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Educational Administration (AREA)
- Neurology (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
Claims (20)
- 一种数据处理方法,其特征在于,包括:接收用户设备发送的超参数信息,所述超参数信息用于指示用户任务对应的超参数搜索空间;从所述超参数搜索空间采样多个超参数组合;将第一超参数组合、训练集包括的多个样本和所述多个样本的评估指标作为神经预测器的输入,通过所述神经预测器确定所述第一超参数组合对应的预测指标,所述第一超参数组合为所述多个超参数组合的任一个,以得到所述多个超参数组合对应的多个预测指标;向所述用户设备发送K个超参数组合,K为正整数;其中,所述K个超参数组合对应的K个预测指标为所述多个预测指标中最高的K个。
- 如权利要求1所述的方法,其特征在于,还包括:接收所述用户设备发送的所述K个超参数组合对应的K个评估指标;将所述K个超参数组合作为K个样本,并将所述K个样本以及对应的所述K个评估指标加入所述训练集。
- 如权利要求1或2所述的方法,其特征在于,所述神经预测器是通过以下方式训练得到的:从所述训练集中选择多个样本、所述多个样本对应的评估指标,以及从所述训练集中选择一个目标样本;将所述多个样本、所述多个样本对应的评估指标以及所述目标样本作为所述神经预测器的输入,通过所述神经预测器确定所述目标样本对应的预测指标;根据所述目标样本的预测指标与所述目标样本对应的评估指标的比较结果,调整所述神经预测器的网络参数。
- 如权利要求1-3任一项所述的方法,其特征在于,将第一超参数组合、训练集包括的多个样本和所述多个样本的评估指标作为所述神经预测器的输入,通过神经预测器确定所述第一超参数组合对应的预测指标,包括:将所述第一超参数组合、所述训练集包括的多个样本和所述多个样本的评估指标输入所述神经预测器;所述神经预测器基于所述第一超参数组合、所述多个样本、所述多个样本的评估指标以及两个锚点特征确定所述第一超参数组合对应的预测指标;其中,所述两个锚点特征用于标定所述用户任务的最低预测指标的编码特征以及最高预测指标的编码特征。
- 如权利要求4所述的方法,其特征在于,所述神经预测器支持输入的样本数量为T,T为正整数;所述通过所述神经预测器基于所述第一超参数组合、所述多个样本、所述多个样本的评估指标以及两个锚点特征确定所述第一超参数组合对应的预测指标,包括:所述神经预测器对输入的T个样本进行编码得到T个辅助特征,以及对所述第一超参 数组合进行编码得到目标特征;所述神经预测器确定所述目标特征与所述T个辅助特征的相似度以及所述目标特征与所述两个锚点特征的相似度;所述神经预测器根据所述目标特征与所述T个辅助特征以及所述两个锚点特征分别对应的相似度确定T+2个权重,所述T+2个权重包括所述T个样本的权重以及所述两个锚点特征的权重;所述神经预测器根据所述T+2个权重对T+2个评估指标进行加权以得到所述第一超参数组合的预测指标;其中,所述T+2个评估指标包括所述T个样本的评估指标以及所述两个锚点特征对应的评估指标。
- 如权利要求4或5所述的方法,其特征在于,所述两个锚点特征属于所述神经预测器的网络参数。
- 如权利要求1-3任一项所述的方法,其特征在于,所述神经预测器支持输入的样本数量为T,T为正整数;将第一超参数组合、训练集包括的多个样本和所述多个样本的评估指标作为所述神经预测器的输入,通过神经预测器确定所述第一超参数组合对应的预测指标,包括:所述神经预测器对输入的T个样本进行编码得到T个辅助特征,对所述第一超参数组合进行编码得到目标特征;所述神经预测器分别确定所述目标特征与所述T个辅助特征的相似度;所述神经预测器根据所述目标特征与所述T个辅助特征分别对应的相似度确定所述T个样本分别对应的权重;所述神经预测器根据所述T个样本分别对应的权重对所述T个样本对应的评估指标进行加权得到所述第一超参数组合的预测指标。
- 如权利要求1-4任一项所述的方法,其特征在于,所述神经预测器支持输入的超参样本的数量为T,T为正整数;将第一超参数组合、训练集包括的多个样本和所述多个样本的评估指标作为所述神经预测器的输入,通过神经预测器确定所述第一超参数组合对应的预测指标,包括:将T+1个连接参数信息输入所述神经预测器;所述T+1个连接参数信息包括T个样本中每个样本和对应的评估指标连接后得到的T个连接参数信息,以及所述第一超参数组合和目标预测指标掩码连接后得到的连接参数信息,所述目标预测指标掩码用于表征所述第一超参数组合对应的未知预测指标;通过所述神经预测器对输入的所述T+1个连接参数信息中每两个连接参数信息进行相似度匹配得到每两个连接参数信息之间的相似度;所述神经预测器根据所述T+1个连接参数信息中每两个连接参数信息之间的相似度确定所述第一超参数组合的预测指标。
- 一种数据处理装置,其特征在于,包括:接收单元,用于接收用户设备发送的超参数信息,所述超参数信息用于指示用户任务对应的超参数搜索空间;处理单元,用于从所述超参数搜索空间采样多个超参数组合;将第一超参数组合、训练集包括的多个样本和所述多个样本的评估指标作为神经预测器的输入,通过所述神经预测器确定所述第一超参数组合对应的预测指标,所述第一超参数组合为所述多个超参数组合的任一个,以得到所述多个超参数组合对应的多个预测指标;发送单元,用于向所述用户设备发送K个超参数组合,K为正整数;其中,所述K个超参数组合对应的K个预测指标为所述多个预测指标中最高的K个。
- 如权利要求9所述的装置,其特征在于,所述接收单元,还用于接收所述用户设备发送的所述K个超参数组合对应的K个评估指标;所述处理单元,还用于将所述K个超参数组合作为K个样本,并将所述K个样本以及对应的所述K个评估指标加入所述训练集。
- 如权利要求9或10所述的装置,其特征在于,所述处理单元,还用于通过以下方式训练得到所述神经预测器:从所述训练集中选择多个样本、所述多个样本对应的评估指标,以及从所述训练集中选择一个目标样本;将所述多个样本、所述多个样本对应的评估指标以及所述目标样本作为所述神经预测器的输入,通过所述神经预测器确定所述目标样本对应的预测指标;根据所述目标样本的预测指标与所述目标样本对应的评估指标的比较结果,调整所述神经预测器的网络参数。
- 如权利要求9-11任一项所述的装置,其特征在于,所述处理单元,具体用于:将所述第一超参数组合、所述训练集包括的多个样本和所述多个样本的评估指标输入所述神经预测器;所述神经预测器基于所述第一超参数组合、所述多个样本、所述多个样本的评估指标以及两个锚点特征确定所述第一超参数组合对应的预测指标;其中,所述两个锚点特征用于标定所述用户任务的最低预测指标的编码特征以及最高预测指标的编码特征。
- 如权利要求12所述的装置,其特征在于,所述神经预测器支持输入的样本数量为T,T为正整数;所述处理单元,具体用于:所述神经预测器对输入的T个样本进行编码得到T个辅助特征,以及对所述第一超参数组合进行编码得到目标特征;所述神经预测器确定所述目标特征与所述T个辅助特征的相似度以及所述目标特征与所述两个锚点特征的相似度;所述神经预测器根据所述目标特征与所述T个辅助特征以及两个锚点特征分别对应的相似度确定T+2个权重,所述T+2个权重包括所述T个样本的权重以及所述两个锚点特征的权重;所述神经预测器根据所述T+2个权重对T+2个评估指标进行加权得到所述第一超参数组合的预测指标;其中,所述T+2个评估指标包括所述T个样本的评估指标以及所述两个锚点特征对应的评估指标。
- 如权利要求12或13所述的装置,其特征在于,所述两个锚点特征属于所述神经预测器的网络参数。
- 如权利要求9-11任一项所述的装置,其特征在于,所述神经预测器支持输入的样本数量为T,T为正整数;所述处理单元,具体用于:通过所述神经预测器对输入的T个样本进行编码得到T个辅助特征,对所述第一超参数组合进行编码得到目标特征;通过所述神经预测器分别确定所述目标特征与所述T个辅助特征的相似度;通过所述神经预测器根据所述目标特征与所述T个辅助特征分别对应的相似度确定T个样本分别对应的权重;通过所述神经预测器根据T个样本分别对应的权重对所述T个样本对应的评估指标进行加权得到所述第一超参数组合的预测指标。
- 如权利要求9-11任一项所述的装置,其特征在于,所述神经预测器支持输入的超参样本的数量为T,T为正整数;所述处理单元,具体用于:将T+1个连接参数信息输入所述神经预测器;所述T+1个连接参数信息包括T个样本中每个样本和对应的评估指标连接后得到的T个连接参数信息,以及所述第一超参数组合和目标预测指标掩码连接后得到的连接参数信息,所述目标预测指标掩码用于表征所述第一超参数组合对应的未知预测指标;通过所述神经预测器对输入的所述T+1个连接参数信息中每两个连接参数信息进行相似度匹配得到每两个连接参数信息之间的相似度;通过所述神经预测器根据所述T+1个连接参数信息中每两个连接参数信息之间的相似度确定所述第一超参数组合的预测指标。
- 一种数据处理装置,其特征在于,包括至少一个处理器和存储器;所述存储器,用于存储计算机程序或指令;所述至少一个处理器,用于执行所述计算机程序或指令,以使得如权利要求1-8中任一项所述的方法被执行。
- 一种芯片系统,其特征在于,所述芯片系统包括处理器;所述处理器与存储器相连,用于读取并执行所述存储器中存储的软件程序,以实现如权利要求1-8任一项所述的方法。
- 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有指令,当所述指令被计算机执行时,使得如权利要求1-8任一项所述的方法被执行。
- 一种包含计算机程序或指令的计算机程序产品,其特征在于,当其在计算机上运行时,使得上述权利要求1-8任一项所述的方法被执行。
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP23773856.2A EP4481645A4 (en) | 2022-03-24 | 2023-03-21 | DATA PROCESSING METHOD AND DEVICE |
| US18/894,506 US20250013877A1 (en) | 2022-03-24 | 2024-09-24 | Data Processing Method and Apparatus |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202210303118.6A CN116861962A (zh) | 2022-03-24 | 2022-03-24 | 一种数据处理方法及装置 |
| CN202210303118.6 | 2022-03-24 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US18/894,506 Continuation US20250013877A1 (en) | 2022-03-24 | 2024-09-24 | Data Processing Method and Apparatus |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2023179609A1 true WO2023179609A1 (zh) | 2023-09-28 |
Family
ID=88099991
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2023/082786 Ceased WO2023179609A1 (zh) | 2022-03-24 | 2023-03-21 | 一种数据处理方法及装置 |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US20250013877A1 (zh) |
| EP (1) | EP4481645A4 (zh) |
| CN (1) | CN116861962A (zh) |
| WO (1) | WO2023179609A1 (zh) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN117764631A (zh) * | 2024-02-22 | 2024-03-26 | 山东中翰软件有限公司 | 基于源端静态数据建模的数据治理优化方法及系统 |
| CN118353779A (zh) * | 2024-05-20 | 2024-07-16 | 广州楚晨网络科技有限公司 | 一种优化策略的物联网网络确定方法 |
| CN119202558A (zh) * | 2024-11-24 | 2024-12-27 | 西南石油大学 | 一种天然气管道内腐蚀速率预测方法 |
| CN120297787A (zh) * | 2025-03-24 | 2025-07-11 | 山东省鲁南地质工程勘察院(山东省地质矿产勘查开发局第二地质大队) | 一种用于碳循环的循环质量数据评价方法、系统及设备 |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119810017A (zh) * | 2024-11-25 | 2025-04-11 | 深圳市华汉伟业科技有限公司 | 缺陷检测方法及装置、超参数优化方法及装置 |
| CN119229853B (zh) * | 2024-11-29 | 2025-03-14 | 苏州元脑智能科技有限公司 | 语音处理方法、计算机程序产品、设备和存储介质 |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109657805A (zh) * | 2018-12-07 | 2019-04-19 | 泰康保险集团股份有限公司 | 超参数确定方法、装置、电子设备及计算机可读介质 |
| CN109740113A (zh) * | 2018-12-03 | 2019-05-10 | 东软集团股份有限公司 | 超参数阈值范围确定方法、装置、存储介质及电子设备 |
| KR20190118937A (ko) * | 2018-04-11 | 2019-10-21 | 삼성에스디에스 주식회사 | 하이퍼파라미터의 최적화 시스템 및 방법 |
| WO2020048722A1 (en) * | 2018-09-04 | 2020-03-12 | Siemens Aktiengesellschaft | Transfer learning of a machine-learning model using a hyperparameter response model |
-
2022
- 2022-03-24 CN CN202210303118.6A patent/CN116861962A/zh active Pending
-
2023
- 2023-03-21 EP EP23773856.2A patent/EP4481645A4/en active Pending
- 2023-03-21 WO PCT/CN2023/082786 patent/WO2023179609A1/zh not_active Ceased
-
2024
- 2024-09-24 US US18/894,506 patent/US20250013877A1/en active Pending
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR20190118937A (ko) * | 2018-04-11 | 2019-10-21 | 삼성에스디에스 주식회사 | 하이퍼파라미터의 최적화 시스템 및 방법 |
| WO2020048722A1 (en) * | 2018-09-04 | 2020-03-12 | Siemens Aktiengesellschaft | Transfer learning of a machine-learning model using a hyperparameter response model |
| CN109740113A (zh) * | 2018-12-03 | 2019-05-10 | 东软集团股份有限公司 | 超参数阈值范围确定方法、装置、存储介质及电子设备 |
| CN109657805A (zh) * | 2018-12-07 | 2019-04-19 | 泰康保险集团股份有限公司 | 超参数确定方法、装置、电子设备及计算机可读介质 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4481645A4 * |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN117764631A (zh) * | 2024-02-22 | 2024-03-26 | 山东中翰软件有限公司 | 基于源端静态数据建模的数据治理优化方法及系统 |
| CN118353779A (zh) * | 2024-05-20 | 2024-07-16 | 广州楚晨网络科技有限公司 | 一种优化策略的物联网网络确定方法 |
| CN118353779B (zh) * | 2024-05-20 | 2025-01-24 | 广州楚晨网络科技有限公司 | 一种优化策略的物联网网络确定方法 |
| CN119202558A (zh) * | 2024-11-24 | 2024-12-27 | 西南石油大学 | 一种天然气管道内腐蚀速率预测方法 |
| CN120297787A (zh) * | 2025-03-24 | 2025-07-11 | 山东省鲁南地质工程勘察院(山东省地质矿产勘查开发局第二地质大队) | 一种用于碳循环的循环质量数据评价方法、系统及设备 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN116861962A (zh) | 2023-10-10 |
| EP4481645A1 (en) | 2024-12-25 |
| US20250013877A1 (en) | 2025-01-09 |
| EP4481645A4 (en) | 2025-06-18 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2023179609A1 (zh) | 一种数据处理方法及装置 | |
| CN111819580B (zh) | 用于密集图像预测任务的神经架构搜索 | |
| CN113570029B (zh) | 获取神经网络模型的方法、图像处理方法及装置 | |
| US20220319154A1 (en) | Neural network model update method, image processing method, and apparatus | |
| US12327400B2 (en) | Neural network optimization method and apparatus | |
| CN112115352B (zh) | 基于用户兴趣的会话推荐方法及系统 | |
| US11585918B2 (en) | Generative adversarial network-based target identification | |
| CN113505883B (zh) | 一种神经网络训练方法以及装置 | |
| CN109891897B (zh) | 用于分析媒体内容的方法 | |
| WO2019155064A1 (en) | Data compression using jointly trained encoder, decoder, and prior neural networks | |
| CN111727441A (zh) | 实现用于高效学习的条件神经过程的神经网络系统 | |
| WO2020112189A1 (en) | Computer architecture for artificial image generation using auto-encoder | |
| CN114004383A (zh) | 时间序列预测模型的训练方法、时间序列预测方法及装置 | |
| US20240046067A1 (en) | Data processing method and related device | |
| WO2024032096A1 (zh) | 反应物分子的预测方法、训练方法、装置以及电子设备 | |
| US20240249115A1 (en) | Neural network model optimization method and related device | |
| US12462552B2 (en) | System and method for prompt searching | |
| JPWO2019229931A1 (ja) | 情報処理装置、制御方法、及びプログラム | |
| EP3888008A1 (en) | Computer architecture for artificial image generation | |
| WO2023174064A1 (zh) | 自动搜索方法、自动搜索的性能预测模型训练方法及装置 | |
| WO2021057690A1 (zh) | 构建神经网络的方法与装置、及图像处理方法与装置 | |
| US20260004490A1 (en) | Feedback Predictions for Machine-Learned Generative Models | |
| CN116361643A (zh) | 实现对象推荐的模型训练方法及对象推荐方法及相关装置 | |
| GB2700328A (en) | Match reference image structure using diffusion models | |
| CN119810748A (zh) | 障碍物检测方法及其模型获取方法、装置和智能家居设备 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23773856 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2023773856 Country of ref document: EP Ref document number: 23773856.2 Country of ref document: EP |
|
| ENP | Entry into the national phase |
Ref document number: 2023773856 Country of ref document: EP Effective date: 20240919 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |