WO2022057433A1 - 一种机器学习模型的训练的方法以及相关设备 - Google Patents
一种机器学习模型的训练的方法以及相关设备 Download PDFInfo
- Publication number
- WO2022057433A1 WO2022057433A1 PCT/CN2021/107391 CN2021107391W WO2022057433A1 WO 2022057433 A1 WO2022057433 A1 WO 2022057433A1 CN 2021107391 W CN2021107391 W CN 2021107391W WO 2022057433 A1 WO2022057433 A1 WO 2022057433A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- neural network
- data
- machine learning
- training
- learning model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/098—Distributed learning, e.g. federated learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/0985—Hyperparameter optimisation; Meta-learning; Learning-to-learn
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/762—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/766—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using regression, e.g. by projecting features on hyperplanes
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/94—Hardware or software architectures specially adapted for image or video understanding
- G06V10/95—Hardware or software architectures specially adapted for image or video understanding structured as a network, e.g. client-server architectures
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/34—Network arrangements or protocols for supporting network services or applications involving the movement of software or configuration parameters
Definitions
- the present application relates to the field of artificial intelligence, and in particular, to a method for training a machine learning model and related equipment.
- Artificial Intelligence is the use of computers or computer-controlled machines to simulate, extend and expand human intelligence. Artificial intelligence includes the study of the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making. At present, with the increasing willingness of users to protect personal privacy data, user data between data owners cannot be communicated with each other, forming large and small “data islands”. "Data silos” pose new challenges to artificial intelligence based on massive data.
- federated learning In response to the existence of "data islands”, federated learning (federated learning) is proposed, that is, different clients use locally stored training data to train the same neural network, and send the trained neural network to the server. The updates are aggregated.
- the data characteristics of the training data stored in different clients are different, that is, the optimization objectives of different clients are inconsistent, and the clients selected for each round of training are not exactly the same, resulting in the optimization objectives of each round of training may also be inconsistent. , so that the training process of the neural network is prone to oscillation, resulting in poor training effect.
- the embodiments of the present application provide a method for training a machine learning model and related equipment, which allocate different neural networks to training data with different data characteristics, thereby realizing personalized matching between the neural network and the data characteristics; each client The neural network is allocated and trained according to the data characteristics of the training data set stored in the client, and the same neural network can be trained by using the training data with the same data characteristics, which is beneficial to improve the accuracy of the neural network after training.
- the embodiments of the present application provide a method for training a machine learning model, which can be used in the field of artificial intelligence.
- the method is applied to a first client, a plurality of clients are connected in communication with a server, a plurality of modules are stored in the server, and the plurality of modules are used to construct a machine learning model, and the first client is any one of the plurality of clients ;
- the machine learning model can be expressed as a neural network, a linear model or other types of machine learning models.
- the multiple modules that make up the machine learning model can be expressed as a neural network module, a linear model module or other types of machine learning models.
- a module for the model can be expressed as a neural network module, a linear model module or other types of machine learning models.
- the training of the machine learning model includes multiple rounds of iterations, and one iteration of the multiple rounds of iterations includes: the first client obtains at least one first machine learning model, and the at least one first machine learning model is a first The data characteristics of the data set are selected; specifically, the first client can receive multiple modules sent by the server, and select the at least one first machine learning model from at least two second machine learning models, or the first The client may receive the at least one first machine learning model sent by the server.
- the first client uses the first data set to perform a training operation on at least one first machine learning model to obtain at least one trained first machine learning model; the first client includes at least one trained first machine learning model.
- At least one updated module is sent to the server, and the updated module is used for the server to update the stored weight parameters of the module.
- a plurality of neural network modules are stored in the server, and the plurality of neural network modules can form at least two different second neural networks.
- At least one first neural network that matches the data characteristics of the first data set stored by the first client, after the at least one first neural network is trained using the training data of the first client, the server updates the parameters Perform aggregation; through the foregoing method, different neural networks can be allocated to training data with different data characteristics, that is, personalized matching between neural networks and data characteristics is achieved; in addition, since the first client is one of multiple clients For each client of multiple clients, the neural network is allocated and trained according to the data characteristics of the training data set stored by the client, and the same neural network can be trained by using the training data with the same data characteristics.
- the training data of the data characteristics trains different neural networks, which not only realizes the personalized matching between the neural network and the data characteristics, but also helps to improve the accuracy of the neural network after training.
- multiple modules are used to construct at least two second machine learning models, and at least one first machine learning model is selected from at least two second machine learning models; or , the module for constructing at least one first machine learning model is selected from multiple modules.
- two selection manners of the second machine learning model are provided, which improves the implementation flexibility of this solution.
- the machine learning model is a neural network
- the multiple modules stored in the server are neural network modules
- the first client stores a first adaptation relationship
- the first adaptation relationship includes A plurality of adaptation values, where the adaptation values are used to represent the degree of adaptation between the first data set and the second neural network.
- the method further includes: the first client receives a plurality of neural network modules sent by the server.
- the acquisition of the at least one machine learning model by the first client includes: the first client selects at least one first neural network from the at least two second neural networks according to the first adaptation relationship, and the at least one first neural network includes a A first neural network with a high fitness value for a data set.
- the at least one first neural network with a high adaptation value may be the N first neural networks with the highest adaptation value, and the value of N is an integer greater than or equal to 1, for example, the value of N may be 1, 2, 3, 4, 5, 6 or other values are not limited here.
- the at least one first neural network with a high adaptation value may be at least one first neural network greater than a fourth threshold, and the value of the fourth threshold may be determined in combination with factors such as the generation method and value range of the adaptation value.
- the at least one first neural network further includes a neural network randomly selected by the first client from the at least two second neural networks.
- a first adaptation relationship is preconfigured on the first client, and then according to the first adaptation relationship, at least one with a higher adaptation value to the first data set is selected from the at least two second neural networks.
- a first neural network to ensure that the selected neural network is adapted to the data characteristics of the first data set, and to ensure the realization of personalized customization of the neural networks of different clients; in addition, select the data characteristics of the first data set. Adaptation is beneficial to improve the accuracy of the neural network after training.
- the first client may obtain a first adaptation matrix according to the first adaptation relationship, and the values in the first adaptation matrix Each element represents an adaptation value.
- the first adaptation relationship can be completed by means of matrix decomposition, and the completed first adaptation relationship no longer includes A null value, so that at least one first neural network with a higher adaptation value with the first data set can be selected according to the completed first adaptation relationship.
- the adaptation value between the first data set and a second neural network corresponds to the function value of the first loss function, and the smaller the function value of the first loss function, the first The larger the adaptation value between the data set and a second neural network.
- the first loss function indicates the similarity between the prediction result of the first data and the correct result of the first data
- the prediction result of the first data is obtained by a second neural network, and the correct result of the first data and the first data Obtained based on the first data set.
- the first data may be any data in the first data set, or may be at least two data subsets obtained after performing a clustering operation on the first data set, and the first data is any of the aforementioned at least two data subsets.
- the class center for the subset of data.
- the first data is any data in the first data set
- the loss function is used to calculate the adaptation value between the first data set and a first neural network.
- the solution is simple, easy to implement, and has high accuracy.
- an adaptation value between the first data set and a second neural network corresponds to a first similarity, and the greater the first similarity, the first data set and a second The larger the adaptation value between neural networks is.
- the first similarity refers to the similarity between a second neural network and a third neural network
- the third neural network is the neural network with the highest accuracy of outputting the prediction result in the previous iteration.
- the third neural network is the neural network with the highest accuracy rate of outputting the prediction result in the previous iteration
- the third neural network is the neural network that has been trained by using the first data set, that is, the third neural network
- the degree of adaptation between the network and the first data set is high. If the similarity between a first neural network and the third neural network is high, it proves that the adaptation degree between the first neural network and the first data set is high. , the adaptation value will be large; another implementation scheme of the calculation of the adaptation value is provided, which improves the implementation flexibility of this scheme.
- the similarity between a second neural network and a third neural network is determined by any one of the following methods: the first client inputs the same data to a second neural network respectively and the third neural network, and compare the similarity between the output data of a second neural network and the output data of the third neural network.
- the first client calculates a similarity between a weight parameter matrix of the second neural network and a weight parameter matrix of the third neural network.
- the similarity between the two can be obtained by calculating the Euclidean distance, Mahalanobis distance, cosine distance, cross entropy or other methods between the two.
- the similarity between the output data of a first neural network and the output data of the third neural network may refer to the output data of the entire first neural network and the entire third neural network.
- the first similarity between the output data of the network may also refer to the difference between the output data of each module in the first neural network and the output data of each module in the third neural network.
- the similarity between each module is calculated, and the product of the similarity between the output data of each module is calculated to obtain the similarity between the output data of the entire first neural network and the output data of the entire third neural network.
- the machine learning model is a neural network
- the method further includes: the first client receives a selector sent by the server, where the selector is used to select from a plurality of neural network modules and the first A neural network of at least one neural network module that matches data features of the data set.
- the first client inputs the training data to the selector according to the first data set, and obtains the indication information output by the selector.
- the indication information includes the probability that each neural network module in the plurality of neural network modules is selected, which is used to indicate the neural network module for constructing at least one first neural network; further, if the plurality of neural network modules include Z
- the indication information may specifically be represented as a vector including Z elements, and each element of the Z elements indicates the probability of a neural network module being selected.
- the first client receives a neural network module for constructing at least one first neural network from the server.
- the training data is input to the selector, the indication information output by the selector is obtained, and the neural network module for constructing the first neural network is selected according to the indication information, and the selector is used for constructing the first neural network.
- Selecting the neural network of the neural network module matching the data features of the first data set from a plurality of neural network modules provides another implementation method of selecting the neural network module for constructing the first neural network, which improves the implementation of the present solution. Flexibility; and selecting through a neural network is beneficial to improve the accuracy of the selection process of the neural network module.
- the process of inputting training data into a selector is directed.
- the first client may input each first training data in the first data set into the selector once, respectively, to obtain indication information corresponding to each first training data.
- the first client can also perform a clustering operation on the first data set, and respectively input several clustered cluster centers (an example of training data) into the selector, so as to obtain the corresponding cluster center Instructions.
- the first client may also perform a clustering operation on the first data set, and respectively sample several first training data from several clustered data subsets, and use the sampled first training data (training data An example of ) are respectively input to the selector to obtain the indication information corresponding to the first training data obtained by each sample.
- the first client can initialize an array for indicating the number of times each neural network module is selected, and the initial value is 0, and the array can also be in a table, a matrix or other forms.
- the first client After the first client obtains at least one indication information, for each indication information, for the neural network module whose selection probability is greater than the fifth threshold, the number of times corresponding to the neural network module in the array is increased by one, and after traversing all the indication information , the first client counts at least one neural network module whose selected times are greater than the sixth threshold according to the array, and determines the aforementioned at least one neural network module as a neural network module for constructing at least one first neural network.
- the first client may also average the plurality of indication information to obtain a vector including Z elements, each element in the vector indicates the probability of a neural network module being selected, and then Obtain the H elements with the largest average value from the Z elements, and determine the H neural network modules pointed to by the aforementioned H elements as the neural network modules for constructing at least one first neural network, Z is an integer greater than 1, H is an integer greater than or equal to 1.
- the machine learning model is a neural network
- the multiple modules stored in the server are neural network modules
- the method further includes: the first A client calculates an adaptation value between the first data set and each of the at least one first neural network.
- the first data set includes a plurality of first training data, and the higher the adaptation value between the first training data and the first neural network, in the process of using the first training data to train the first neural network once, The greater the degree of modification to the weight parameters of the first neural network.
- the method of adjusting the modified weight parameter of the first neural network includes: adjusting the learning rate, adjusting the coefficient of the penalty term, or other methods.
- the first client calculates an adaptation value between the first data set and each of the first neural networks in the at least one first neural network, including: A data set is clustered to obtain at least two data subsets, the first data subset is a subset of the first data set, and the first data subset is any one of at least two data subsets; the first client According to the first data subset and the first loss function, an adaptation value between the first data subset and a first neural network is generated. The larger the adaptation value between neural networks is.
- the first loss function indicates the similarity between the predicted result of the first data and the correct result of the first data.
- the prediction result of the first data is obtained through a first neural network, and the first data refers to any data in the first data subset, or the first data refers to the class center of the first data subset.
- the correct results of the first data and the first data are obtained based on the first data subset, and the adaptation value between the first data subset and a first neural network is determined as each data in the first data subset and a first data subset. An adaptation value between neural networks.
- the first data set is clustered to obtain at least two data subsets, and the adaptation values between different training data in the same data subset and a first neural network are the same, that is, the same type of training
- the ability of data to modify a first neural network is the same, to meet the situation that there are at least two data subsets with different data characteristics in the same client, so as to further improve the personalized customization ability of the neural network, which is beneficial to improve the neural network after training. accuracy of the network.
- the machine learning model is a neural network
- the multiple modules stored in the server are neural network modules.
- the first client performing a training operation on the at least one first machine learning model by using the first data set includes: the first client performing a training operation on the first neural network by using the first data set according to the second loss function.
- the second loss function includes a first term and a second term
- the first term indicates the similarity between the first prediction result and the correct result of the first training data
- the second term indicates the first prediction result and the second prediction result
- the similarity between , the second term can be called a penalty term or a constraint term.
- the first prediction result is the prediction result of the first training data output by the first neural network after the first training data is input into the first neural network; the second prediction result is inputting the first training data into the fourth neural network. Then, the prediction result of the first training data output by the fourth neural network.
- the fourth neural network is the first neural network that has not performed the training operation, that is, the initial state of the fourth loss function and the second loss function are consistent, but in the process of training the second loss function, the fourth loss function The weight parameter is never updated.
- the second loss function will also indicate the first neural network.
- the similarity between the first prediction result and the second prediction result is to prevent the first neural network from being changed too much during the training process.
- the first data set includes a plurality of first training data and the correct result of each first training data
- the method further includes: the first client receives a selector sent by the server, and selects
- the device is a neural network for selecting at least one first neural network module from the plurality of neural network modules that matches the data features of the first data set.
- the first client uses the first data set to perform a training operation on at least one first machine learning model, including: the first client inputs the first training data into the selector, and obtains indication information output by the selector, where the indication information includes a plurality of neural networks.
- the probability that each neural network module in the network module is selected is used to indicate the neural network module that constructs the first neural network; according to the plurality of neural network modules, the instruction information and the first training data, the first output of the first neural network is obtained.
- the prediction result of the training data; a training operation is performed on the first neural network and the selector according to a third loss function, wherein the third loss function indicates the similarity between the prediction result of the first training data and the correct result, and also indicates the Indicates how discrete the information is.
- the method further includes: the first client sends the trained selector to the server.
- the selector while training the neural network module for constructing the first neural network, the selector is trained, which saves computer resources; the selector is used to process the data to be processed to train the selector, which is beneficial to improve the indication information output by the selector. 's accuracy.
- the embodiments of the present application provide a method for training a machine learning model, which can be used in the field of artificial intelligence.
- the method is applied to a server, the server is communicated and connected with multiple clients, multiple modules are stored in the server, and the multiple modules are used to build a machine learning model, the first client is any one of the multiple clients, and the machine learning
- the training of the model includes multiple rounds of iterations, and one iteration of the multiple rounds of iterations includes: the server acquiring at least one first machine learning model corresponding to the first client, where the first client is one of the multiple clients, At least one first machine learning model corresponds to data characteristics of the first data set stored by the first client.
- the server sends at least one first machine learning model to the first client, and the at least one first machine learning model instructs the first client to use the first data set to perform a training operation on the at least one first machine learning model, and obtain at least one The first machine learning model.
- the server receives from the first client at least one updated neural network module included in the at least one trained first machine learning model, and updates the stored weight parameters of the neural network module according to the at least one updated neural network module.
- different neural networks can be allocated to training data with different data characteristics, that is, personalized matching between neural networks and data characteristics is realized; since the first client is any one of multiple clients For each client in the multiple clients, the neural network is allocated and trained according to the data characteristics of the training data set stored by the client, and the same neural network can be trained by using the training data with the same data characteristics, and the training of different data characteristics
- the data trains different neural networks, which not only realizes the personalized matching between the neural network and the data characteristics, but also helps to improve the accuracy of the neural network after training; the server selects the neural network suitable for each client, which not only avoids In order to send all the extra-neural network modules to the client, the waste of the storage resources of the client is reduced; and the occupation of the computer resources of the client is avoided, which is beneficial to improve the user experience.
- multiple modules are used to construct at least two second machine learning models, and at least one first machine learning model is selected from at least two second machine learning models; or , the module for constructing at least one first machine learning model is selected from multiple modules.
- the server updates the stored weight parameters of the neural network module according to at least one updated neural network module, which may include: since the same neural network module may exist in the data sent by different clients, then The server performs a weighted average of the weight parameters of the same neural network module sent by different clients as the weight parameters of the neural network module in the server. For the neural network modules that do not overlap in different clients, the parameters of the neural network module sent by the client are directly used as the weight parameters of the neural network module in the server.
- the same neural network module means that the specific neural networks are the same and located in the same group.
- the server updates the stored weight parameters of the neural network module according to at least one updated neural network module, which may include: if there is training data in the server, the method of model distillation may also be used. , using multiple updated neural network modules sent by multiple clients to update the weight parameters of the neural network modules stored by the server. That is to say, the training data stored in the server is used to retrain multiple neural network modules stored in the server. The purpose of the training is to bring the output data of the neural network module stored in the server closer to the updated output of the neural network module sent by the client. similarity between data.
- the machine learning model is a neural network
- the multiple modules stored in the server are neural network modules
- the server stores a second adaptation relationship
- the second adaptation relationship includes multiple
- the adaptation value is used to indicate the degree of adaptation between the training data stored in the client and the second neural network.
- the method further includes: the server receives an adaptation value between the first data set sent by the first client and the at least one second neural network, and updates the second adaptation relationship.
- Acquiring at least one first neural network by the server includes: the server selects at least one first neural network from a plurality of second neural networks according to the second adaptation relationship, and the at least one first neural network includes an adaptation value with the first data set high neural network.
- the server may obtain a second adaptation matrix corresponding to the second adaptation relationship, and perform matrix decomposition on the second adaptation matrix to obtain the decomposed similarity matrix of the neural network and the similarity matrix of the user.
- the neural network The product of the similarity matrix of and the similarity matrix of the user needs to be similar to the value of the corresponding position in the second adaptation relationship. Then multiply the similarity matrix of the neural network and the similarity matrix of the user to obtain a second complement matrix, and then select a high adaptation value with the first data set (that is, the first client) according to the second complement matrix. of at least one first neural network.
- the at least one first neural network selected by the first client may include not only at least one first neural network with a high adaptation value, but also at least one randomly selected first neural network.
- a second adaptation relationship is configured on the server side, the client generates an adaptation value and sends it to the client, and the server selects the first neural network adapted to the first client according to the second adaptation relationship , which not only avoids the occupation of client computer resources, but also avoids the leakage of client data.
- the machine learning model is a neural network
- the multiple modules stored in the server are neural network modules
- the method further includes: the server receives the first identification information sent by the first client, the first The identification information is identification information of the first neural network, or the first identification information is identification information of a neural network module that constructs the first neural network.
- the server sending the at least one first machine learning model to the first client includes: the server sending the first neural network pointed to by the first identification information to the first client, or sending to the first client the first neural network pointed to by the first identification information A neural network module that builds the first neural network.
- the machine learning model is a neural network
- the multiple modules stored in the server are neural network modules
- the server is further configured with a selector.
- the method further includes: the server receives at least one class center sent by the first client, and after performing a clustering operation on the first data set, obtains at least one data subset, and one class center in the at least one class center is at least one data subset The class center of a subset of data in .
- Obtaining the at least one first machine learning model corresponding to the first client by the server includes: the server inputs the class center to the selector respectively, obtains the indication information output by the selector, and determines, according to the indication information, a model for constructing the at least one first neural network.
- a neural network module the indication information includes a probability that each neural network module of the plurality of neural network modules is selected.
- the server sending the at least one first machine learning model to the first client includes: the server sending the neural network module for constructing at least one first neural network to the first client.
- the selection step of the neural network module is performed by the selector, which is beneficial to improve the accuracy of the selection process, and the server performs the selection step, which is beneficial to release the storage space of the client and avoid the occupation of computer resources of the client , and only send the class center to the server, and try to avoid the leakage of client information.
- the machine learning model is a neural network
- the multiple modules stored in the server are neural network modules
- a neural network is divided into at least two sub-modules
- the neural network modules stored in the server are divided into For at least two groups corresponding to at least two sub-modules, different neural network modules in the same group have the same function.
- the method further includes: the server calculates the similarity between different neural network modules in the at least two neural network modules included in the same group, and Merge two neural network modules whose similarity is greater than a preset threshold.
- the server may randomly select a neural network module from two different neural network networks; or, if the second neural network module and the first neural network module are embodied as the same neural network, the difference is only in the weight parameters. Then, the server may also average the weight parameters of the second neural network module and the first neural network module to generate the combined weight parameters of the neural network module.
- two neural network modules whose similarity is greater than a preset threshold are merged, that is, two redundant neural network modules are merged, which not only reduces the difficulty of managing multiple neural network modules by the server;
- the end-to-end similarity of two neural network modules is greater than a preset threshold for repeated training, so as to reduce the waste of client computer resources.
- the different neural network modules include a second neural network module and a first neural network module
- the similarity between the second neural network module and the first neural network module is determined by any of the following Determine in one way: the server inputs the same data to the second neural network module and the first neural network module respectively, and compares the similarity between the output data of the second neural network module and the output data of the first neural network module; or, Calculate the similarity between the weight parameter matrix of the second neural network module and the weight parameter matrix of the first neural network module.
- the calculation method of the similarity between the two includes, but is not limited to: calculating the Euclidean distance, Mahalanobis distance, cosine distance or cross entropy between the two.
- the embodiments of the present application provide a data processing method, which can be used in the field of artificial intelligence.
- the server acquires at least one third neural network corresponding to the data characteristics of the second data set stored by the second client, and sends the at least one third neural network to the second client, where the at least one third neural network is used for the client Generate predictions for the data to be processed.
- the server acquires at least one third neural network corresponding to the data characteristics of the second data set stored by the second client, which may be any one or more of the following three items: the server Receiving at least one second-type center, and inputting the at least one second-type center into the selector respectively, to obtain a neural network module for constructing at least one third neural network, each second-type center is a second data subsection The class center of the set, at least one second data subset is obtained by performing a clustering operation on the second data set.
- the server selects at least one third neural network from at least two second neural networks according to the identification information of the second client and the second adaptation relationship, and the at least one third neural network includes a high level of adaptation to the second data set. neural network.
- the server randomly selects at least one third neural network from the plurality of second neural networks.
- the embodiments of the present application provide a data processing method, which can be used in the field of artificial intelligence.
- the second client obtains the second identification information corresponding to the data characteristics of the second data set stored by the second client, and sends an acquisition request to the server, where the acquisition request carries the second identification information, and the second identification information is the third identification information.
- the identification information of the neural network, or the second identification information is the identification information of the neural network module that constructs the third neural network.
- the second client receives one or more third neural networks pointed to by the second identification information, or receives a neural network module for constructing one or more first neural networks pointed to by the second identification information.
- the embodiments of the present application provide a training device for a machine learning model, which can be used in the field of artificial intelligence.
- the device is applied to a first client, a plurality of clients are connected in communication with a server, a plurality of modules are stored in the server, and the plurality of modules are used to build a machine learning model, and the first client is any one of the plurality of clients .
- the training device of the machine learning model is used to perform multiple rounds of iterations, and the training device of the machine learning model includes: an acquisition unit, a training unit and a sending unit, and in one iteration of the multiple rounds of iterations, an acquisition unit for acquiring at least one first a machine learning model, at least one first machine learning model is selected according to the data characteristics of the first training data set stored by the first client; the training unit is used for using the first data set to analyze the at least one first machine learning model Execute the training operation to obtain at least one trained first machine learning model; the sending unit is configured to send at least one updated module included in the at least one trained first machine learning model to the server, and the updated module is used for Weight parameter for the server to update the stored module.
- the apparatus for training a machine learning model may also be used to implement the steps executed by the first client in various possible implementation manners of the first aspect.
- the apparatus for training a machine learning model may also be used to implement the steps executed by the first client in various possible implementation manners of the first aspect.
- the embodiments of the present application provide a training device for a machine learning model, which can be used in the field of artificial intelligence.
- the device is applied to a server, the server is connected to a plurality of clients in communication, a plurality of modules are stored in the server, and the plurality of modules are used for building a machine learning model, and the first client is any one of the plurality of clients.
- the training device of the machine learning model is used to perform multiple rounds of iterations, and the training device of the machine learning model includes: an acquisition unit, a sending unit and an updating unit, and in one iteration of the multiple rounds of iterations, the acquisition unit is used to acquire and first at least one first machine learning model corresponding to the client, where the first client is one of multiple clients, and the at least one first machine learning model corresponds to the data characteristics of the first data set stored by the first client; A sending unit, configured to send at least one first machine learning model to the first client, where the at least one first machine learning model instructs the first client to perform a training operation on the at least one first machine learning model using the first data set, and obtain At least one trained first machine learning model; an update unit, configured to receive from the first client at least one updated neural network module included in the at least one trained first machine learning model, and according to the at least one updated neural network module; The neural network module updates the stored weight parameters of the neural network module.
- the apparatus for training the machine learning model may also be used to implement the steps executed by the server in various possible implementation manners of the second aspect.
- the steps executed by the server in various possible implementation manners of the second aspect.
- the beneficial effects brought by each possible implementation manner reference may be made to the descriptions in the various possible implementation manners in the second aspect, and details are not repeated here.
- an embodiment of the present application provides a server, which may include a processor, the processor is coupled with a memory, the memory stores program instructions, and when the program instructions stored in the memory are executed by the processor, the machine learning of the first aspect is implemented
- a method of training a model, or, when program instructions stored in a memory are executed by a processor, the method of training a machine learning model of the first aspect described above is implemented.
- an embodiment of the present application provides a terminal device, which may include a processor, the processor is coupled to a memory, the memory stores program instructions, and the machine of the first aspect is implemented when the program instructions stored in the memory are executed by the processor. Learn how to train a model. For the steps performed by the first client in each possible implementation manner of the processor performing the first aspect, reference may be made to the first aspect for details, and details are not repeated here.
- an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program runs on a computer, the computer is made to execute the machine learning model of the first aspect above.
- the training method, or, the computer is made to execute the training method of the machine learning model of the second aspect.
- an embodiment of the present application provides a circuit system, the circuit system includes a processing circuit, and the processing circuit is configured to execute the method for training a machine learning model of the first aspect above, or the processing circuit is configured to execute the second aspect above The training method of the machine learning model.
- an embodiment of the present application provides a computer program that, when the computer program runs on a computer, causes the computer to execute the method for training a machine learning model of the first aspect above, or causes the computer to execute the second method described above. Aspects of the training methods of machine learning models.
- an embodiment of the present application provides a chip system, where the chip system includes a processor for supporting a training device or an execution device to implement the functions involved in the above aspects, for example, sending or processing the functions involved in the above methods data and/or information.
- the chip system further includes a memory for storing necessary program instructions and data of the server or the communication device.
- the chip system may be composed of chips, or may include chips and other discrete devices.
- FIG. 1 is a schematic structural diagram of an artificial intelligence main frame provided by an embodiment of the present application.
- FIG. 2 is a system architecture diagram of a training system for a machine model provided by an embodiment of the present application
- FIG. 3 is a schematic diagram of a plurality of different data sets in the training method of the machine learning model provided by the embodiment of the present application;
- FIG. 4 is a schematic flowchart of a training method for a machine learning model provided by an embodiment of the present application
- FIG. 5 is a schematic diagram of multiple neural network modules in the training method of the machine learning model provided by the embodiment of the present application.
- FIG. 6 is another schematic diagram of multiple neural network modules in the training method of the machine learning model provided by the embodiment of the present application.
- FIG. 7 is a schematic diagram of three structures of the second neural network in the training method of the machine learning model provided by the embodiment of the present application.
- FIG 8 is another schematic structural diagram of the second neural network in the training method of the machine learning model provided by the embodiment of the present application.
- FIG. 9 is another schematic flowchart of a training method for a machine learning model provided by an embodiment of the present application.
- FIG. 10 is another schematic flowchart of a training method for a machine learning model provided by an embodiment of the present application.
- FIG. 11 is a schematic flowchart of still another method for training a machine learning model provided by an embodiment of the present application.
- FIG. 12 is another schematic flowchart of a training method for a machine learning model provided by an embodiment of the present application.
- FIG. 13 is a schematic flowchart of a data processing method provided by an embodiment of the present application.
- FIG. 14 is a schematic structural diagram of a training device for a machine learning model provided by an embodiment of the application.
- FIG. 15 is another schematic structural diagram of the apparatus for training a machine learning model provided by an embodiment of the application.
- FIG. 16 is another schematic structural diagram of a training device for a machine learning model provided by an embodiment of the application.
- 17 is a schematic structural diagram of still another apparatus for training a machine learning model provided by an embodiment of the application.
- FIG. 18 is a schematic structural diagram of a training device provided by an embodiment of the application.
- FIG. 19 is a schematic structural diagram of an execution device provided by an embodiment of the application.
- FIG. 20 is a schematic structural diagram of a chip provided by an embodiment of the present application.
- the embodiments of the present application provide a method for training a machine learning model and related equipment, which allocate different neural networks to training data with different data characteristics, thereby realizing personalized matching between the neural network and the data characteristics; each client The neural network is allocated and trained according to the data characteristics of the training data set stored in the client, and the same neural network can be trained by using the training data with the same data characteristics, which is beneficial to improve the accuracy of the neural network after training.
- Figure 1 shows a schematic structural diagram of the main frame of artificial intelligence.
- the above-mentioned artificial intelligence theme framework is explained in two dimensions (vertical axis).
- the "intelligent information chain” reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, data has gone through the process of "data-information-knowledge-wisdom".
- the "IT value chain” reflects the value brought by artificial intelligence to the information technology industry from the underlying infrastructure of human intelligence, information (providing and processing technology implementation) to the industrial ecological process of the system.
- the infrastructure provides computing power support for artificial intelligence systems, realizes communication with the outside world, and supports through the basic platform. Communicate with the outside through sensors; computing power is provided by a smart chip, as an example, the smart chip includes a central processing unit (CPU), a neural-network processing unit (NPU), a graphics processor (graphics unit) processing unit, GPU), application specific integrated circuit (ASIC), field programmable gate array (field programmable gate array, FPGA) and other hardware acceleration chips; the basic platform includes distributed computing framework and network-related platforms Guarantee and support can include cloud storage and computing, interconnection networks, etc. For example, sensors communicate with external parties to obtain data, and these data are provided to the intelligent chips in the distributed computing system provided by the basic platform for calculation.
- CPU central processing unit
- NPU neural-network processing unit
- GPU graphics processor
- ASIC application specific integrated circuit
- FPGA field programmable gate array
- Guarantee and support can include cloud storage and computing, interconnection networks, etc.
- sensors communicate with external parties to obtain data, and these data are provided to
- the data on the upper layer of the infrastructure indicates the source of data in the field of artificial intelligence.
- the data involves graphics, images, voice, and text, as well as IoT data from traditional devices, including business data from existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
- Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making, etc.
- machine learning and deep learning can perform symbolic and formalized intelligent information modeling, extraction, preprocessing, training, etc. on data.
- Reasoning refers to the process of simulating human's intelligent reasoning method in a computer or intelligent system, using formalized information to carry out machine thinking and solving problems according to the reasoning control strategy, and the typical function is search and matching.
- Decision-making refers to the process of making decisions after intelligent information is reasoned, usually providing functions such as classification, sorting, and prediction.
- some general capabilities can be formed based on the results of the data processing, such as algorithms or a general system, such as image classification, personalized management of images, and personalized management of battery charging. , text analysis, processing of computer vision, speech recognition, etc.
- Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of the overall artificial intelligence solution, the productization of intelligent information decision-making, and the realization of landing applications. Its application areas mainly include: intelligent terminals, intelligent manufacturing, Smart transportation, smart home, smart healthcare, smart security, autonomous driving, smart city, etc.
- the embodiments of the present application are mainly used to train machine learning models used in various application scenarios.
- the trained machine learning models can be applied to the above-mentioned various application fields to realize classification, regression, or other functions.
- the processing objects of the model can be image samples, discrete data samples, text samples, or voice samples, etc., which are not exhaustive here.
- the machine learning model can be expressed as a neural network, a linear model or other types of machine learning models, etc.
- the multiple modules that make up the machine learning model can be expressed as a neural network module, a linear model module or other types of modules.
- the modules of the machine learning model, etc. are not exhaustive here.
- the machine learning model is only represented by a neural network as an example for description. When the machine learning model is represented by other types other than neural networks, it can be understood by analogy, which is not repeated in the embodiments of the present application.
- FIG. 2 is a system architecture diagram of a training system for a machine model provided by an embodiment of the present application.
- the training system of the machine model includes a server 100 and multiple clients 200, and the server 100 and multiple clients 200 are connected in communication.
- the server 100 may specifically be represented as a server, or as a server cluster composed of multiple servers. Although only one server 100 and three clients 200 are shown in FIG. 2 , in actual situations, the server 100 and clients The number of terminals 200 can be determined in combination with actual requirements.
- the client 200 may be configured on a terminal device, or may be configured on a server, which is not limited here.
- a plurality of neural network modules are stored on the server 100, and the plurality of neural network modules are used to construct at least two second neural networks; each client A data set is stored on the client 200, and the data set stored on the client 200 can be used to perform training operations on the neural network.
- the first client among the multiple clients 200 stores the first data set, and the first client is any client among the multiple clients 200.
- the first client obtains the At least one first neural network adapted to the data characteristics of a data set, and using the first data set to perform a training operation on the at least one first neural network to obtain at least one trained first neural network, and then training the at least one first neural network
- the at least one updated neural network module included in the first machine learning model is sent to the server 100 .
- Each client 200 of the multiple clients 200 can perform the foregoing operations, and the server 100 can receive multiple updated neural network modules sent by the multiple clients 200.
- the neural network module updates the stored weight parameters of the neural network module to complete an iterative process. The updating of the weight parameters of the multiple neural network modules stored in the server 100 is achieved through multiple rounds of iterative processes.
- the difference between distributed training and federated learning is that the data for training the neural network is sent by the server 100 to each client 200 .
- the server 100 stores a plurality of neural network modules for constructing at least two second neural networks, and also stores a data set; the server 100 first performs a clustering operation on the stored data set , to obtain multiple data subsets after clustering, and then the server 100 sends data subsets corresponding to one class or several classes to each client 200 respectively, that is, different clients 200 can store data with different data characteristics. Data subsets.
- the server 100 may perform clustering on the entire stored data set, or may first divide the entire data set into different data subsets according to the correct label of each data in the data set, and then sequentially classify the data sets. Perform a clustering operation on each subsequent data subset to obtain multiple clustered data subsets.
- the server 100 may directly send the clustered data subset to the client 200, or may sample some data from at least one clustered data subset and send it to the client 200, etc.
- the server 100 deploys the data for performing the training operation to each client 200, it also needs to update the weight parameters of the multiple neural network modules stored in the server 100 through multiple rounds of iteration. It is the same as the implementation method of each round of iterative process in federated learning, and will not be repeated here.
- the client 200 acquires a third neural network corresponding to the data characteristics of the data set stored by the client 200, and then uses the acquired neural network to generate a prediction result of the input data.
- FIG. 3 is a schematic diagram of multiple different data sets in the training method of the machine learning model provided by the embodiment of the present application. In Fig.
- the task of neural network is image classification as an example, the data sets stored in four clients are shown, and the data set stored in the first client is the image set of dogs, then the first client needs a neural network
- the task to be performed is to classify dogs
- the data set stored by the second client is a collection of images of wolves, then the task that the first client needs the neural network to perform is to classify wolves
- the data set stored by the third client If the image collection of dogs and the image collection of sofas are included, the third client needs two different neural networks, one neural network performs the task of classifying dogs, and the other neural network performs the task of classifying sofas .
- the data characteristics of the dog image set, the wolf image set, and the sofa image set are different, and the data characteristics of the dog image set stored in the first client and the dog image set stored in the third client are the same. .
- the training method of the machine learning model provided by the embodiment of the present application will be introduced in detail below. Since the method affects both the training phase and the inference phase, and the implementation processes of the training phase and the inference phase are different, the following two phases are respectively described below. The specific implementation process is described.
- a plurality of neural network modules are stored in the server, and in one iteration, for a first client among the multiple clients, at least one first neural network adapted to the first data set needs to be acquired network, the selection operation of the aforementioned at least one first neural network can be performed by a server or a client, and the implementation processes of the aforementioned two methods are different. Further, the aforementioned selection operation can be performed on the server or client according to the adaptation relationship between the neural network and the data set, that is, the server or client first uses multiple neural network modules to construct multiple second neural networks, and then uses multiple neural network modules to construct multiple second neural networks.
- At least one first neural network adapted to the data characteristics of the data set stored in one client is selected from the plurality of second neural networks.
- the server or client can also use a selector (a kind of neural network) to perform the aforementioned selection operation, that is, the server or client first uses the selector to select the data set stored in one client from multiple neural network modules At least one neural network module adapted to the characteristics of the data, and then use the selected neural network module to construct at least one first neural network.
- a selector a kind of neural network
- the client selects a first neural network adapted to the data characteristics of the data set stored by the client according to the first adaptation relationship
- FIG. 4 is a schematic flowchart of a training method for a machine learning model provided by an embodiment of the present application.
- the method may include:
- the first client receives multiple neural network modules sent by the server, and constructs at least two second neural networks according to the multiple neural network modules.
- the server will send the stored multiple neural network modules to the first client.
- the first client receives multiple neural network modules, and constructs at least two neural network modules according to the multiple neural network modules. a second neural network.
- the aforementioned plurality of neural network modules may be pre-trained neural network modules, or may be completely untrained neural network modules.
- each of the at least two second neural networks may be divided into at least two sub-modules, and the plurality of neural network modules include at least two groups corresponding to the at least two sub-modules.
- Different groups can include the same number of neural network modules, or different numbers of neural network modules; different neural network modules in the same group have the same function, as an example, for example, the functions of the neural network modules in the same group are all feature extraction. , or, the functions of the neural network modules in the same group are all feature transformation, or, the functions of the neural network modules in the same group are all classification, etc., which are not exhaustive here.
- the server may add a new neural network module to the plurality of neural network modules, or may perform a deletion operation on the plurality of neural network modules, or the like.
- Different neural network modules can be expressed as different neural networks, as an example, for example, the first group has 3 neural network modules, the first neural network module uses a 3-layer multilayer perceptron (MLP), the second The first neural network module uses a 2-layer MLP, and the third neural network module uses a 2-layer convolutional neural network (CNN).
- different neural network modules can also be the same neural network but with different weight parameters.
- the first group has 3 neural network modules, the first neural network module adopts a 2-layer multilayer perceptron (MLP), the second neural network module adopts a 2-layer MLP, and the third neural network module adopts a 2-layer MLP.
- the neural network module adopts a 2-layer convolutional neural network (CNN), but the weight parameters of the first neural network module and the second neural network module are different. It should be understood that the examples here are only for the convenience of understanding this scheme. , not used to limit this program
- FIG. 5 and FIG. 6 are two schematic diagrams of multiple neural network modules in the training method of the machine learning model provided by the embodiment of the present application.
- 5 and 6 include different numbers of neural network modules in different groups, and the layered structure composed of multiple neural network modules has a total of 4 layers (that is, correspondingly, the multiple neural network modules are divided into 4 groups).
- the first layer (that is, the first group of neural network modules) includes 3 neural network modules.
- the first neural network module SGL1M1 uses a 3-layer MLP
- the second neural network module SGL1M2 uses a 2-layer MLP
- the third neural network module SGL1M2 uses a 2-layer MLP.
- the neural network module SGL1M3 is a 2-layer CNN.
- the second layer (that is, the second group of neural network modules) includes 4 neural network modules.
- the first neural network module SGL2M1 adopts 3-layer MLP
- the second neural network module SGL2M2 adopts 2-layer MLP
- the third neural network module SGL2M2 adopts 2-layer MLP.
- the neural network module SGL2M3 is a 3-layer CNN
- the fourth neural network module SGL2M4 is a 2-layer CNN.
- the third layer (ie, the third group of neural network modules) includes 2 neural network modules, the first neural network module SGL3M1 uses a 2-layer MLP, and the second neural network module SGL3M2 uses a 2-layer CNN.
- the fourth layer (that is, the fourth group of neural network modules) includes 4 neural network modules.
- the first neural network module SGL4M1 uses a 3-layer MLP
- the second neural network module SGL4M2 uses a 2-layer MLP
- the third neural network module SGL4M2 uses a 2-layer MLP.
- the neural network module SGL4M3 is a 1-layer CNN+2-layer MLP
- the fourth neural network module SGL4M4 is a 1-layer CNN+1-layer MLP.
- the main model of the layered structure of multiple groups of neural network modules is a binary tree structure.
- the neural network modules included in the first layer are SGL1M1;
- the neural network modules included in the second layer are SGL2M1, SGL2M2;
- among the neural network modules included in the third layer that is, the third group of neural network modules), from left to right are SGL3M1, SGL3M2, SGL3M3, and SGL3M4;
- the fourth layer (that is, the fourth group of neural network modules) includes In the neural network module of , from left to right are SGL4M1, SGL4M2, SGL4M3, SGL4M4, SGL4M5, SGL4M6, SGL4M7, SGL4M8.
- the second neural network can be SGL1M1+SGL2M1 +SGL3M1+SGL4M1, as another example, for example, the second neural network may be SGL1M1+SGL2M2+SGL3M3+SGL4M5, etc., which will not be exhaustive here. It should be understood that the examples in FIG. 5 and FIG. 6 are only for the convenience of understanding the present solution, and are not intended to limit the present solution.
- the process of constructing at least two second neural networks for a plurality of neural network modules After receiving the multiple neural network modules sent by the server, the first client can select only one neural network module from each group to construct the second neural network, that is, the second neural network is a single branch. The first client may also select at least two neural network modules from each group to construct a second neural network, that is, a second neural network includes multiple branches. The first client may also select no neural network modules from a certain set of neural network modules.
- FIG. 7 and FIG. 8 are four schematic structural diagrams of the second neural network in the training method of the machine learning model provided by the embodiment of the application. .
- the second neural network is constructed on the basis of the multiple groups of neural network modules shown in FIG. 5 .
- the neural network module selected for the first layer is SGL1M1
- the neural network module selected for the second layer is SGL2M1
- the neural network module selected for the third layer is SGL3M1 and SGL3M2
- the fourth layer is selected.
- the neural network module is SGL4M1.
- the output of SGL2M1 is divided into inputs of SGL3M1 and SGL3M2, and the output weighted average of SGL3M1 and SGL3M2 is used as input of SGL4M1.
- a transformation layer TL is added between the second layer and the third layer of the second neural network, and the output of SGL2M1 is divided into SGL3M1 and SGL3M2.
- the outputs of SGL3M1 and SGL3M2 serve as the input of the transformation layer TL
- the output of the transformation layer TL serves as the input of SGL4M1.
- h3 1 SGL3M1(h2);
- h3 2 SGL3M2(h2);
- h TL TL(h3 1 , h3 2 );
- x represents the input data
- h1 represents the output of SGL1M1
- input h1 to SGL2M1 get the output h2 of SGL2M1
- input the output of SGL2M1 to SGL3M1 and SGL3M2 respectively
- get h3 1 and h3 2 respectively
- input h3 1 and h3 2 to
- the transformation layer TL the h TL output by the transformation layer TL is obtained, and the h TL is input into the SGL4M1 to obtain the prediction result y of x output by the entire second neural network.
- the output of SGL2M1 is divided into input as SGL3M1 and SGL3M2, the output of SGL3M1 and SGL3M2 is used as the input of the conversion layer TL through SGL2M as the selection signal, and the output of TL is used as the input of SGL4M1.
- the calculation process of the second neural network is disclosed below:
- h3 1 SGL3M1(h2);
- h3 2 SGL3M2(h2);
- h TL h3 1 *TL(h2)+h3 2 *(1-TL(h2));
- x, h1, h2, h3 1 and h3 2 are similar to the meanings in the previous implementation, and you can refer to the understanding; the difference is that the generation method of h TL is different.
- y represents the prediction result of x output by the second neural network in this implementation manner, it should be understood that the examples in FIG.
- the neural network module selected in the first layer is SGL1M1
- the neural network module selected in the second layer is SGL2M1
- the third layer is vacant
- the neural network module selected in the fourth layer is SGL2M1.
- the network module is SGL4M1.
- the first client selects at least one first neural network from at least two second neural networks.
- the first client may be selected from at least two At least two first neural networks are randomly selected from the second neural networks.
- the number of randomly selected first neural networks can be preset, for example, 4, 5, or 6, etc., which are not limited here.
- the first client randomly selects at least two first neural networks from at least two second neural networks.
- the value of the first threshold may be 10 percent, 12 percent, 15 percent, etc., which is not limited here.
- the first client may select the first data set from at least two second neural networks according to the first adaptation relationship
- the adaptation value of at least one first neural network is high.
- a first adaptation relationship is preconfigured on the first client, and then according to the first adaptation relationship, a network with a higher adaptation value with the first data set is selected from at least two second neural networks.
- At least one first neural network in order to ensure that the selected neural network is adapted to the data characteristics of the first data set, so as to ensure the realization of personalized customization of neural networks of different clients; Feature adaptation is beneficial to improve the accuracy of the trained neural network.
- the at least one first neural network with a high adaptation value may be the N first neural networks with the highest adaptation value, and the value of N may be an integer greater than or equal to 1.
- the value of N may be 1 , 2, 3, 4, 5, 6 or other values, etc., which are not limited here.
- the at least one first neural network with a high adaptation value may be at least one first neural network greater than a fourth threshold, and the value of the fourth threshold may be flexibly determined in combination with factors such as the generation method of the adaptation value, the value range, and the like, There is no limitation here.
- the first client can obtain a first adaptation matrix according to the first adaptation relationship, each element in the first adaptation matrix represents an adaptation value, and when there is a null value in the first adaptation relationship, it can be
- the first adaptation relationship is completed by means of matrix decomposition, and the completed first adaptation relationship no longer includes null values, so that the first adaptation relationship with the first data set can be selected according to the completed first adaptation relationship.
- At least one first neural network with a high fitness value At least one first neural network with a high fitness value.
- the first client selects the neural network from at least the first adaptation relationship according to the first adaptation relationship.
- At least one first neural network is selected from the two second neural networks, and the at least one first neural network includes a neural network with a high adaptation value with the first data set.
- the at least one first neural network selected by the first client may include not only at least one first neural network with a high adaptation value, but also at least one randomly selected first neural network.
- the first client calculates an adaptation value between the first data set and the first neural network.
- the first client may store the first adaptation relationship in the form of a table, matrix, index, array, etc., where the first adaptation relationship includes a plurality of adaptation values, and the adaptation value uses is used to represent the degree of adaptation between the first data set and the second neural network; the first adaptation relationship may also include identification information of each second neural network, which is used to uniquely identify each first network.
- the following table is used as an example to show the first adaptation relationship.
- ID is the abbreviation of Identity.
- multiple neural network modules are divided into 4 groups.
- the first group of neural network modules includes 4 neural network modules, and the second group of neural network modules includes 3 neural network modules.
- neural network modules the third group of neural network modules includes 2 neural network modules, the fourth group of neural network modules includes 4 neural network modules, and the first client selects only one neural network module from each group to construct Taking the second neural network as an example, a total of 96 second neural networks can be constructed.
- An adaptation relationship does not necessarily include the adaptation value between the first data set and each second neural network, and the first client can obtain the first data through matrix decomposition and other methods according to the existing adaptation value.
- the adaptation value between the set and each second neural network, and the specific calculation process will be described in the subsequent steps.
- a first data set is stored on the first client, and the first data set includes a plurality of first training data and an accurate result of each first training data.
- the first client After acquiring at least one first neural network, the first client needs to calculate the adaptation value between the first data set and the first neural network, and write the adaptation value calculated in step 403 into the first adaptation relationship, that is, the first adaptation relationship is updated according to the adaptation value calculated in step 403 .
- the generation method of the adaptation value please refer to the description in the subsequent steps, which will not be introduced here.
- the adaptation value between the first data set and a first neural network can be calculated in the following two ways.
- the adaptation value is obtained by calculating the function value of the loss function
- the adaptation value between the first data set and a first neural network corresponds to the function value of the first loss function.
- the first loss function indicates the similarity between the predicted result of the first data and the correct result of the first data.
- the prediction result of the first data is obtained through a first neural network, and the first data and the correct result of the first data are obtained based on the first data set.
- the larger the function value of the first loss function the smaller the adaptation value between the first data set and a first neural network; the smaller the function value of the first loss function, the smaller the adaptation value between the first data set and a first neural network.
- the larger the adaptation value is.
- the loss function is used to calculate the adaptation value between the first data set and a first neural network, the solution is simple, easy to implement, and has high accuracy.
- the first client performs clustering on the first data set to obtain at least one data subset, where the first data subset is a subset of the first data set, and the first data subset is Any of at least one subset of data. Further, the first client generates an adaptation value between the first data subset and a first neural network according to the first data subset and the first loss function. Wherein, the first loss function indicates the similarity between the prediction result of the first data and the correct result of the first data; the prediction result of the first data is obtained by a first neural network, and the correct result of the first data and the first data Obtained based on the first subset of data.
- the first client performs the foregoing operations on each of the at least two data subsets to obtain an adaptation value between each data subset and a first neural network.
- the first client may average the adaptation values between the plurality of data subsets and the one first neural network to obtain the adaptation value between the entire first data set and the one first neural network, and Update the first adaptation relationship.
- the first data refers to any data in the first data subset, since the data in the first data subset will be used to perform training operations on the first neural network (that is, The first data subset may include the first training data), and may also be used to test the accuracy of the trained first neural network (that is, the first data subset may include test data), and may also be used to verify the first neural network.
- the correctness of hyperparameters in a neural network that is, the first data subset may also include verification data
- the first data may be data for training, data for testing, or data for Validated data.
- the first client can input each first data in the first data subset to the one first neural network, obtain a prediction result of the first data output by the one first neural network, and predict according to the first data
- the result and the correct result of the first data are calculated to obtain the function value of the first loss function.
- the first client performs the foregoing operation on each first data in the first data subset to obtain the function values of multiple loss functions.
- the function values of the aforementioned multiple loss functions are averaged to obtain an adaptation value between the entire first data subset and the one first neural network. Further, the first client may determine the reciprocal of the average value of the function values of the multiple loss functions as the adaptation value between the entire first data subset and the one first neural network.
- the first data refers to the class center of the first data subset
- the first client can also calculate the class center of all the data in the first data subset according to the first data subset, and use the The class center inputs the first neural network, and obtains the prediction result of the first data output by the first neural network; the first client averages the correct results of all the data in the first data subset to obtain a The correct result of the first data, and then the function value of the first loss function is calculated to obtain the adaptation value between the entire first data subset and the one first neural network.
- the first client may take an inverse of the function value of the aforementioned one loss function, and determine the inverse as an adaptation value between the entire first data subset and the one first neural network.
- the adaptation value between the first data subset and a first neural network may be determined. is an adaptation value between each training data in the first data subset and a first neural network, and the adaptation values between the training data in different data subsets and the first neural network are different.
- the first data set is clustered to obtain at least two data subsets, and the adaptation values between different training data in the same data subset and a first neural network are the same, that is, the same type of
- the training data has the same ability to modify a first neural network, so as to meet the situation that there are at least two data subsets with different data characteristics in the same client, so as to further improve the personalized customization ability of the neural network, which is beneficial to improve the performance after training.
- the accuracy of the neural network is provided.
- the first data refers to any data in the first data set, because the data in the first data set will be used to perform a training operation on the first neural network (that is, the first data set will include the first training data), may also be used to test the accuracy of the trained first neural network (that is, the first data set may include test data), and may also be used to verify the hyperparameters in the first neural network. correctness (that is, the first data set may also include verification data), the first data may be data used for training, data used for testing, or data used for verification.
- the first client can sequentially input each first data in the first data set to the one first neural network, and obtain the function value of the loss function corresponding to each first data, and combine the functions of the multiple loss functions.
- the values are averaged to obtain a function value of a loss function corresponding to the entire first data set, and then according to the aforementioned function value of a loss function corresponding to the entire first data set, the entire first data set and the one The adaptation value between the first neural networks.
- the adaptation value between the first data set and a first neural network is determined as the first neural network.
- the adaptation value between each first data in a data set and the first neural network that is, the adaptation values between all the first training data in the first data set and the first neural network are the same .
- the first client may sequentially input each first data in the first data set to the one first neural network, and obtain the function value of the loss function corresponding to each first data , and generate the adaptation value between each first data and the one first neural network, and then average the adaptation values between all the first data and the one first neural network to obtain the entire first neural network.
- the adaptation value between the data set and the one first neural network in combination with the description in step 404, in the process of performing the training operation on the first neural network by using the first data set, each first data has an adaptation value between the first neural network and the first neural network. .
- the adaptation value between the first data set and a first neural network corresponds to the first similarity.
- the greater the first similarity the greater the adaptation value between the first data set and a first neural network; the smaller the first similarity, the greater the adaptation between the first data set and a first neural network the smaller the value.
- the first similarity refers to the similarity between a first neural network and a third neural network.
- the third neural network is the neural network with the highest accuracy of outputting the prediction result in the previous iteration; or, if this iteration is not the first iteration, the third neural network can also be the same as the first neural network.
- the neural network that is, the first neural network and the third neural network correspond to the same identification information.
- the difference between the third neural network and the first neural network is that the third neural network is a trained neural network obtained by the first client using the first data set to perform a training operation on the third neural network last time.
- the third neural network is the neural network with the highest accuracy rate of outputting the prediction result in the previous iteration
- the third neural network is the neural network that has been trained by using the first data set, that is, the third neural network
- the degree of adaptation between the neural network and the first data set is high. If the similarity between a first neural network and the third neural network is high, it proves the degree of adaptation between the first neural network and the first data set. If the value is high, the adaptation value will be large; another implementation scheme of the calculation of the adaptation value is provided, which improves the implementation flexibility of this scheme.
- the similarity between a first neural network and a third neural network is determined by any one of the following methods:
- the first client inputs the same data to a first neural network and a third neural network respectively, and compares the similarity between the output data of the first neural network and the output data of the third neural network Spend.
- the similarity can be obtained by calculating the Euclidean distance, Mahalanobis distance, cosine distance, cross entropy or other methods between the two.
- the similarity between the output data of a first neural network and the output data of the third neural network may refer to the first similarity between the output data of the entire first neural network and the output data of the entire third neural network.
- the first similarity can be directly determined as the similarity between the first neural network and the third neural network; or, after converting the first similarity, the first neural network and the third neural network can be obtained. Similarity between neural networks.
- the similarity between the output data of a first neural network and the output data of the third neural network may also refer to the similarity between the output data of each module in the first neural network and the output data of each module in the third neural network. Similarity, calculate the product of the similarity between the output data of each module, get the similarity between the output data of the entire first neural network and the output data of the entire third neural network, and then get the first neural network and the third neural network. Similarity between neural networks.
- the first client can also calculate a weight parameter matrix of the first neural network and the second similarity between the weight parameter matrices of the third neural network, and then the second similarity may be determined as the similarity between the one first neural network and the third neural network; or, the second similarity After conversion, the similarity between the first neural network and the third neural network is obtained.
- the second similarity may be obtained by calculating the Euclidean distance, Mahalanobis distance, cosine distance, cross entropy or other methods between the two.
- the embodiment of the present application provides two calculation methods for the similarity between the first neural network and the third neural network, which improves the implementation flexibility of the solution.
- an unconfidence degree can also be added to the third neural network.
- the unconfidence level and the calculated adaptation value may be used to determine the final adaptation value, which may be added or multiplied.
- the first client uses the first data set to perform a training operation on the first neural network to obtain a trained first neural network.
- the first client after obtaining at least one first neural network, the first client will perform a training operation on the first neural network by using the first data set, so as to obtain a trained first neural network.
- the first data set includes a plurality of first training data and the correct result of each first training data
- the first client inputs a first training data into a first neural network to obtain the first neural network
- the prediction result of the first training data output by the network.
- the function value of the fourth loss function is generated, and the gradient derivation is performed according to the function value of the fourth loss function to reversely update the first neural network.
- the first client performs iterative training on the first neural network until the preset conditions are met, and a trained first neural network is obtained.
- the fourth loss function indicates the prediction result of the first training data and the correct result of the first training data, and the type of the fourth loss function is related to the task type of the first neural network.
- the task of the first neural network is classification
- the fourth loss function may be a cross-entropy loss function, a 0-1 loss function, or other loss functions, etc., which are not limited here.
- the goal of iteratively training the first neural network by the first client is to shorten the similarity between the prediction result of the first training data and the correct result of the first training data; the preset condition may be satisfying the fourth loss function
- the convergence condition can also be a preset number of iterations.
- LossM 1 represents the fourth loss function
- the value of j is 1 to J i
- M k represents a second loss function
- the first client needs to initialize the parameters of the first neural network before using the first data set to perform a training operation on the first neural network.
- the first client can directly use the parameters of the first neural network sent by the server to the first client; in another way, the first neural network can also be trained by using the first client last time
- the weight parameters of the first neural network are initialized by the weight parameters obtained at The weight parameters obtained when training the first neural network are weighted and averaged to initialize the weight parameters of the first neural network this time; in another implementation, the parameters of the second neural network can also be randomly initialized. Do limit.
- step 404 may include: the first client performs a training operation on the first neural network by using the first data set according to the second loss function.
- the second loss function includes a first term and a second term
- the first term indicates the similarity between the first prediction result and the correct result of the first training data
- the second term indicates the first prediction result and the second prediction result
- the similarity between , the second term can be called a penalty term or a constraint term.
- the first prediction result is the prediction result of the first training data output by the first neural network after the first training data is input into the first neural network; the second prediction result is inputting the first training data into the fourth neural network. Then, the prediction result of the first training data output by the fourth neural network.
- the fourth neural network is the first neural network that has not performed the training operation, that is, the initial state of the fourth loss function and the second loss function are consistent, but in the process of training the second loss function, the fourth loss function The weight parameter is never updated.
- the second loss function since the first data set on the first client does not necessarily match the first neural network, in the process of using the first data set to train the first neural network, the second loss function will also indicate The similarity between the first prediction result and the second prediction result, that is, to prevent the first neural network from being changed too much during the training process.
- a penalty item is added to the second loss function on the basis of the fourth loss function.
- the purpose of adding the penalty item is to shorten the prediction result of the first training data output by the first neural network and the first output of the fourth neural network. Similarity between predictions from training data.
- an example of the second loss function is disclosed as follows:
- LossM 2 represents the second loss function
- ⁇ 1 is a hyperparameter
- y′ ij represents the prediction result of the first training data output by the fourth loss function after the first training data is input into the fourth loss function
- step 404 may further include: the first client performs a training operation on the first neural network by using the first data set according to the fifth loss function.
- the fifth loss function indicates the similarity between the first prediction result and the correct result of the first training data, and also indicates the similarity between the first neural network and the fourth neural network, that is, the second loss function is in the first
- a penalty term is added, and the purpose of adding the penalty term is to shorten the similarity between the first neural network and the fourth neural network.
- LossM 3 represents the fifth loss function
- ⁇ 2 is a hyperparameter
- M0 represents the fourth loss function
- the method of adjusting the modification of the weight parameter of the first neural network includes: adjusting the learning rate, adjusting the coefficient of the penalty term, or other methods.
- all training data modify the weight parameters of a first neural network with a fixed capability. It is unreasonable.
- the higher the adaptation value between a first training data and the first neural network it proves that the first neural network should process the first training data.
- the greater the degree of modification to the weight parameters of the first neural network the better the training efficiency of a first neural network.
- M k+1 represents the first neural network after performing a training operation on M k
- ⁇ i represents the learning rate
- ⁇ is a hyperparameter
- E represents the first training data and the first neural network being trained.
- the fitness value between , LossM represents any one of LossM 1 , LossM 2 , and LossM 3 , it should be understood that the above examples are only for the convenience of understanding this scheme, and are not used to limit this scheme.
- the values of ⁇ 1 and ⁇ 2 can both be 1/E, that is, the values of ⁇ 1 and ⁇ 2 can be the first training data and the current training data.
- the adaptation values between different first training data in the first data set and the one first neural network may be different. At least two data subsets can be obtained after clustering the first data set, the training data in the same data subset and the adaptation value between the first neural network are the same, and the training data in different data subsets are the same. It is different from the adaptation value between the one first neural network. It may also be that the adaptation values between each first training data in the first data set and the one first neural network are all different. In another implementation manner, the entire first data set may also be regarded as a whole, and the adaptation values between all the first training data in the first data set and the one first neural network are the same.
- the first client since the first client will select one or more first neural networks, the first client needs to calculate the relationship between the first training data and each of the one or more first neural networks. , and perform a training operation on each of the one or more first neural networks. Then, each time steps 403 and 404 are performed, the execution object is only one first neural network among one or more first neural networks, and the first client needs to repeat steps 403 and 404 multiple times. Alternatively, the first client can first calculate the adaptation values between the first training data and all the first neural networks in the one or more first neural networks through step 403, and then through step 404 respectively The neural network performs iterative operations.
- the first client can directly generate a new neural network module, and combine the received multiple neural network modules to construct a new first neural network.
- the accuracy of the first neural network including the newly added neural network module and the first neural network not including the newly added neural network module can be compared, if the accuracy gain does not exceed The third threshold does not retain the newly added neural network module.
- the first client sends at least one updated neural network module included in the at least one trained first neural network to the server.
- the first client after obtaining at least one trained first neural network, the first client sends at least one updated neural network module included in the at least one trained first neural network to the server, and the corresponding , the server will receive at least one updated neural network module sent by the first client and send it to the server. Since the first client is any one of multiple clients, the server will receive multiple At least one updated neural network module sent by each client.
- the server may also receive the newly added neural network module.
- the server updates the stored weight parameters of the neural network module.
- the server after the server receives at least one updated neural network module sent by each of the multiple clients, the server needs to update the stored neural network module according to the received multiple updated neural network modules.
- the weight parameters of the network to complete one of the multiple iterations.
- the weighted average of the weight parameters of the same neural network module sent by different clients is used as the neural network module in the server.
- weight parameter For the neural network modules that do not overlap in different clients, the parameters of the neural network module sent by the client are directly used as the weight parameters of the neural network module in the server.
- the same neural network module means that the specific neural networks are the same and located in the same group.
- the server can put the newly added neural network module into the corresponding group.
- the server receives the newly added neural network module, it can put the newly added neural network module into the corresponding group.
- all the newly added neural network modules in the same group can be weighted and averaged into one neural network module. After the network module, it is put into this group.
- the method of model distillation can also be used to update the weights of the neural network modules stored in the server by using multiple updated neural network modules sent by multiple clients.
- the training data stored in the server is used to retrain multiple neural network modules stored in the server.
- the purpose of the training is to bring the output data of the neural network module stored in the server closer to the updated output of the neural network module sent by the client. similarity between data.
- a neural network is divided into at least two sub-modules, a plurality of neural network modules stored in the server are divided into at least two groups corresponding to the at least two sub-modules, and different neural network modules in the same group have the same function. Then optionally, after updating the stored weight parameter of the neural network module, the server will also calculate the similarity between different neural network modules in the at least two neural network modules included in the same group, and set the similarity greater than the preset value. The two neural network modules for thresholding are merged.
- combining two neural network modules with a similarity greater than a preset threshold that is, combining two redundant neural network modules, not only reduces the difficulty of managing multiple neural network modules by the server;
- the client performs repeated training on the two neural network modules whose similarity is greater than the preset threshold, so as to reduce the waste of computer resources of the client.
- Different neural network modules in the same group include a second neural network module and a first neural network module, and the similarity between the second neural network module and the first neural network module is determined by any one of the following methods:
- the server inputs the same data to the second neural network module and the first neural network module respectively, and compares the similarity between the output data of the second neural network module and the output data of the first neural network module .
- the calculation method of the similarity includes, but is not limited to, calculating the Euclidean distance, Mahalanobis distance, cosine distance, or cross entropy between the two, etc., which are not exhaustive here.
- the weight parameter matrix of the second neural network module and the first neural network module can be calculated. Similarity between weight parameter matrices of neural network modules. The method of calculating the similarity is similar to the previous implementation, which can be understood by reference.
- the neural network modules in the second layer (ie, the second group), the third layer (ie, the third group), and the fourth layer (ie, the fourth group) are processed by analogy.
- the server can randomly select a neural network module from two different neural network networks; if the second neural network module and the first neural network module are embodied as the same neural network, the difference is only If the weight parameters are different, the weight parameters of the second neural network module and the first neural network module may also be averaged to generate the combined weight parameters of the neural network module.
- the server will re-enter step 401 to re-execute steps 401 to 407, that is, re-execute the next round of iterations.
- FIG. 9 is a schematic flowchart of a method for training a machine learning model provided by an embodiment of the present application.
- the server will send the stored multiple neural network modules to each client respectively.
- Any client (such as the first client) in the first client, first constructs multiple second neural networks according to multiple neural network modules, and selects the first data set stored with the first client from the multiple second neural networks At least one second client that is adapted to the data characteristics of the first data set, calculates the adaptation value between the first data set and each first neural network, and uses the first data set to perform a training operation on each first neural network to obtain A plurality of updated neural network modules for constructing the first neural network, and then the plurality of updated neural network modules are sent to the server. After the server receives the updated neural network modules sent by each of the multiple clients, it updates the stored weight parameters of the multiple neural network modules according to the updated neural network modules sent by all the clients, thereby completing multiple rounds of One iteration within an iteration. Although only two clients are shown in FIG. 9 , in actual situations, the server can establish communication connections with more clients. The example in FIG. 9 is only to facilitate understanding of the solution, and is not intended to limit the solution.
- the client uses the selector to select the first neural network adapted to the data characteristics of the data set stored by the client
- FIG. 10 is a schematic flowchart of a training method for a machine learning model provided by an embodiment of the present application.
- the method may include:
- the first client receives the selector sent by the server.
- the server sends a selector to the first client, and correspondingly, the first client receives the selector sent by the server, where the selector is used to select the first data set from multiple neural network modules
- the data features are matched to the neural network of at least one neural network module.
- the server may also send identification information of each neural network module of the plurality of neural network modules stored by the server to the first client.
- the first client inputs the training data into the selector according to the first data set, and obtains indication information output by the selector, where the indication information includes the probability that each neural network module in the plurality of neural network modules is selected, and is used to instruct the construction at least one neural network module of the first neural network.
- the first client inputs the training data into the selector according to the first data set, and obtains indication information output by the selector; wherein the indication information includes that each neural network module in the plurality of neural network modules is For the probability of selection, if multiple neural network modules include Z neural network modules in total, the indication information can be expressed as a vector including Z elements, each element representing the probability of a neural network module being selected.
- FIG. 5 includes a total of 18 neural network modules, and the value of Z is 18. It should be understood that the examples here are only for the convenience of understanding this solution, and are not used to limit this solution.
- the first client may input each first training data (an example of training data) in the first data set into the selector once, respectively, so as to obtain corresponding data corresponding to each first training data. instruction information.
- the first client can also perform a clustering operation on the first data set, and respectively input several clustered cluster centers (an example of training data) into the selector, so as to obtain a The corresponding indication information for each class center.
- the first client may also perform a clustering operation on the first data set, and respectively sample several first training data from several clustered data subsets, and use the sampled first training data
- One training data (an example of training data) is respectively input to the selector to obtain the indication information corresponding to the first training data obtained by each sample, etc.
- the first client can also generate the indication information in other ways, which is not mentioned here. Do exhaustion.
- a process for determining a neural network module for constructing at least one first neural network according to the indication information can initialize an array for indicating the number of times each neural network module is selected, and the initial value is 0.
- the array can also be in other forms such as a table, a matrix, etc. lift.
- the first client After the first client obtains at least one indication information, for each indication information, for the neural network module whose selection probability is greater than the fifth threshold, the number of times corresponding to the neural network module in the array is increased by one, and after traversing all the indication information , the first client counts at least one neural network module whose selected times are greater than the sixth threshold according to the array, and determines the aforementioned at least one neural network module as a neural network module for constructing at least one first neural network, and the fifth threshold
- the values of the threshold and the sixth threshold can be set according to the actual situation, and are not limited here.
- the first client may also average the plurality of indication information to obtain a vector including Z elements, and each element in the vector indicates a neural network module The probability of being selected, and then obtain the H elements with the largest average value from the Z elements, and determine the H neural network modules pointed to by the aforementioned H elements as the neural network modules for constructing at least one first neural network, Z is an integer greater than 1, and H is an integer greater than or equal to 1.
- Z is an integer greater than 1
- H is an integer greater than or equal to 1.
- the first client sends first identification information to the server, where the first identification information is identification information of the neural network module that constructs the first neural network.
- the first client may also store identification information of each neural network module, and after the first client determines a plurality of neural network modules for constructing the first neural network, it will also obtain The identification information of the aforementioned multiple neural network modules is used to form first identification information, and the first identification information includes identification information of all neural network modules that construct the first neural network.
- the server sends to the first client the neural network module for constructing the first neural network pointed to by the first identification information.
- the server after receiving the first identification information, obtains all neural network modules pointed to by the first identification information from all stored neural network modules (that is, L neural network modules), and sends them to the first identification information.
- a client sends a neural network module for constructing a first neural network pointed to by the first identification information.
- the first client inputs the first training data into the selector to obtain the indication information output by the selector.
- the first client inputs a piece of first training data into the trainer, and obtains a piece of indication information output by the selector, and the piece of indication information may specifically be expressed as a vector including Z elements, indicating Z neurons
- the probability that each neural network module in the network module is selected is used to indicate the neural network module that constructs the first neural network.
- the indication information can be [ MSGL1M1 , MSGL1M2 , MSGL1M3 , MSGL2M1 , ..., MSGL4M3 , MSGL4M4 ], where MSGL1M1 represents that the first neural network module in the first group is selected
- MSGL1M1 represents that the first neural network module in the first group is selected
- the training data is input to the selector, the indication information output by the selector is obtained, and the neural network module for constructing the first neural network is selected according to the indication information, and the selector is used for
- the selector is used for
- another implementation method of selecting the neural network module for constructing the first neural network is provided, which improves the performance of this scheme. To achieve flexibility; and to select through the neural network, it is beneficial to improve the accuracy of the selection process of the neural network module.
- the first client obtains a prediction result of the first training data output by the first neural network according to the received multiple neural network modules, the instruction information, and the first training data.
- the first client can obtain the first output of the first neural network according to the received multiple neural network modules, the indication information and the first training data. Prediction results for training data.
- an example is given in conjunction with FIG. 5 , an example of a formula for calculating the prediction result of the first training data is disclosed as follows:
- M sGL1Mq , M sGL2Mq , M sGL3Mq and M sGL4Mq all come from the indication information output by the selector
- SGL1Mq represents a neural network module in the first group of neural network modules
- SGL1Mq(x) represents inputting the first training data into the first group of neural network modules.
- h1 represents the entire first
- y represents the output of the entire first neural network, that is, the prediction result of the first training data
- the first client performs a training operation on the first neural network and the selector according to a third loss function, wherein the third loss function indicates the similarity between the predicted result of the first training data and the correct result, and also indicates the similarity between the predicted result of the first training data and the correct result. Indicates how discrete the information is.
- the first client after generating the prediction result of the first training data, the first client will generate the first client according to the prediction result of the first training data, the correct result of the first training data and the indication information generated by the selector
- the function value of the three loss functions, and the gradient derivation is performed according to the function value of the third loss function to reversely update the first neural network (that is, update the received multiple neural network modules) and the weight parameters of the selector to Complete one training of the received multiple neural network modules and selectors.
- the purpose of the training is to shorten the similarity between the predicted result of the first training data and the correct result, and to increase the degree of dispersion of the indication information output by the selector.
- the third loss function includes a third item and a fourth item
- the third item indicates the similarity between the prediction result of the first training data and the correct result
- the fourth item indicates the degree of dispersion of the indication information.
- the third item can be obtained based on the cross-entropy distance, first-order distance, second-order distance, etc. between the prediction result and the correct result based on the first training data
- the fourth item can be regularization processing for the indication information, for example, the The instruction information is to perform L1 regularization, LP regularization, etc., which are not limited here.
- LossM 4 represents the third loss function, For the meaning, please refer to the description of formula (1) in the corresponding embodiment of FIG. 4 , which will not be repeated here.
- MS(x) represents the indication information output by the selector, and ⁇ 3 is a hyperparameter. It should be understood that formula (4) The examples are only for the convenience of understanding this solution, and are not used to limit this solution.
- the first client repeatedly performs steps 1005 to 1007 until a preset condition is reached, and obtains a plurality of updated neural network modules and trained trainers pointed to by the first identification information.
- the preset condition may be that the number of iterations of the iterative training reaches the preset number of times, or it may be that the third loss function satisfies the convergence condition.
- the first client sends the at least one updated neural network module and the trained trainer to the server.
- the server updates the stored weight parameters of the neural network module.
- the server needs to update the stored weight parameters of the Z neural network modules when it can receive multiple updated neural network modules sent by multiple clients (including the first client).
- the server needs to update the stored weight parameters of the Z neural network modules when it can receive multiple updated neural network modules sent by multiple clients (including the first client).
- the server updates the weight parameter of the selector.
- the server may receive the trained trainers sent by multiple clients, and average the weight parameters of the corresponding positions in the multiple trained trainers to update the selector stored by the server The weight parameters of , thus completing one iteration of multiple iterations. It should be noted that, after step 1010 is executed, step 1001 may be re-entered to enter the next round of iteration.
- the selector while training the neural network module for constructing the first neural network, the selector is trained, which saves computer resources; using the selector to process the data to be processed to train the selector is beneficial to improve the indication of the selector output accuracy of information.
- different neural networks can be allocated to training data with different data characteristics, that is, personalized matching between neural networks and data characteristics is realized;
- the neural network is allocated and trained according to the data characteristics of the training data set stored by the client, and the same neural network can be trained using the training data with the same data characteristics.
- the training data of different data characteristics trains different neural networks, which not only realizes the personalized matching between the neural network and the data characteristics, but also helps to improve the accuracy of the neural network after training.
- the server selects the first neural network adapted to the data characteristics of the data set stored by the first client according to the second adaptation relationship
- FIG. 11 is a schematic flowchart of a training method for a machine learning model provided by an embodiment of the present application.
- the method may include:
- the server acquires at least one first neural network corresponding to the first client.
- the server may be configured with multiple neural network modules, and the server will construct multiple second neural networks according to the stored multiple neural network modules.
- the server will construct multiple second neural networks according to the stored multiple neural network modules.
- multiple neural network modules and multiple second neural networks For the description, reference may be made to the description in step 401 in the corresponding embodiment of FIG. 4 , which will not be repeated here.
- the server selects to allocate at least one first neural network to the first client, at least one first neural network corresponding to the first client needs to be acquired. Specifically, similar to the description in step 402 in the embodiment corresponding to FIG. 4 , the server may randomly select at least one first neural network module from multiple second neural networks, or may select at least one first neural network module from at least two according to the second adaptation relationship. At least one first neural network having a high adaptation value with the first data set is selected from the second neural network. For the application of the foregoing two manners, reference may be made to the description in step 402 in the corresponding embodiment of FIG. 4 , which will not be repeated here.
- step 402 in the embodiment corresponding to FIG. 4 is that the second adaptation relationship is stored in the server, and the second adaptation relationship includes multiple adaptation values.
- the degree of fit between the stored training data and the second neural network An example is given with reference to FIG. 5 .
- Neural Network ID1 Neural Network ID2 ... Neural Network ID96 Client ID1 E1_1 E1_2 E1_96 Client ID2 E2_1 Null Null ... Client ID100 Null E100_2 E100_96
- E1_1, E1_2, etc. represent the adaptation values, as shown in Table 2, the second adaptation relationship can include null values, It should be understood that the examples in Table 2 are only to facilitate understanding of the solution, and are not intended to limit the solution.
- the server When the server is not performing the allocation operation of the first neural network for the first time, or, in the case that the proportion of the number of adaptation values included in the second adaptation relationship is greater than the first threshold, the server will, according to the second adaptation relationship, select from At least one first neural network having a high adaptation value with the first data set (ie, the first client) is selected from the at least two second neural networks.
- the server may obtain a second adaptation matrix corresponding to the second adaptation relationship, and perform matrix decomposition on the second adaptation matrix to obtain the decomposed similarity matrix of the neural network and the similarity matrix of the user.
- the neural network The product of the similarity matrix of and the similarity matrix of the user needs to be similar to the value of the corresponding position in the second adaptation relationship. Then multiply the similarity matrix of the neural network and the similarity matrix of the user to obtain a second complement matrix, and then select a high adaptation value with the first data set (that is, the first client) according to the second complement matrix. of at least one first neural network.
- the at least one first neural network selected by the first client may include not only at least one first neural network with a high adaptation value, but also at least one randomly selected first neural network.
- the server sends the selected at least one first neural network to the first client.
- the first client calculates an adaptation value between the first data set and the first neural network.
- the first client uses the first data set to perform a training operation on the first neural network to obtain a trained first neural network.
- the first client sends at least one updated neural network module included in the at least one trained first neural network to the server.
- steps 1103 to 1105 for a specific implementation manner of steps 1103 to 1105 , reference may be made to the descriptions of steps 403 to 405 in the corresponding embodiment of FIG. 4 , which will not be repeated here.
- the first client sends the adaptation value between the first data set and each first neural network to the server.
- the first client further sends the adaptation value between each first neural network and the first data set (that is, the first client) calculated in step 1103 to the server.
- the identification information of the neural network and the identification information of the first client are included, which are used to indicate the adaptation values between the first client and the neural networks of the server. It should be understood that step 1106 may be executed together with step 1105, or may be executed before or after any of steps 1104 and 1105, and the execution order of step 1106 is not limited here.
- the server updates the second adaptation relationship.
- the server can obtain the adaptation value sent by each client, that is, the server client obtains multiple sets of adaptation relationships , and each set of adaptation relationships is an adaptation value between a client identifier and a neural network identifier. Then, the server may update the second adaptation relationship according to the received multiple adaptation values. The server may also delete the adaptation value that has not been updated for a long time from the second adaptation relationship, which means that it has not been updated for more than 20 rounds.
- the server updates the stored weight parameters of the neural network module.
- the server may also delete neural network modules that are poorly adapted to all clients, or may also delete neural networks that have not been selected for a long time.
- step 1108 the second adaptation relationship may be updated to delete the information corresponding to the deleted neural network module from the second adaptation relationship. It should be noted that step 1107 may be performed first, and then Step 1108 is performed, or step 1108 may be performed first, and then step 1107 is performed, which is not limited here.
- a second adaptation relationship is configured on the server side, the client generates an adaptation value and sends it to the client, and the server selects the first neural network adapted to the first client according to the second adaptation relationship
- the network not only avoids the occupation of client computer resources, but also avoids the leakage of client data.
- the server uses the selector to select the first neural network adapted to the data characteristics of the data set stored by the first client
- Fig. 12 is a schematic flowchart of a training method of a machine learning model provided by an embodiment of the application, and the method may include:
- the first client After the first client performs a clustering operation on the first data set, at least one data subset is obtained, and at least one first-type center corresponding to the at least one data subset is generated one-to-one.
- the first client after the first client performs a clustering operation on the first data set and obtains at least one data subset, it will generate a first type center of each data subset, thereby generating a center that is related to the at least one data subset. At least one first-type center corresponding to the data subsets one-to-one.
- the server receives at least one first-type center sent by the first client.
- the first client after generating at least one first-type center, sends the at least one first-type center to the server.
- the server receives at least one first-type center sent by the first client. Class Center.
- the server inputs the at least one center of the first type into the selector respectively, obtains indication information output by the selector, and determines a neural network module for constructing at least one first neural network according to the indication information.
- the server inputs at least one center of the first type into the selector respectively, and obtains at least one indication information corresponding to the at least one center of the first type output by the selector. Further, a neural network module for constructing at least one first neural network is selected according to the indication information. For the foregoing selection process, reference may be made to the description in step 1002 in the corresponding embodiment of FIG. 10 , which is not repeated here.
- the server sends the selector and the neural network module for constructing at least one first neural network to the first client.
- the first client inputs the first training data into the selector, and obtains indication information output by the selector, where the indication information includes the probability that each neural network module in the plurality of neural network modules is selected, and is used to instruct the construction of the first neural network. the neural network module.
- the first client obtains a prediction result of the first training data output by the first neural network according to the received multiple neural network modules, the instruction information, and the first training data.
- the first client performs a training operation on the first neural network and the selector according to a third loss function, wherein the third loss function indicates the similarity between the predicted result of the first training data and the correct result, and also indicates the similarity between the predicted result of the first training data and the correct result. Indicates how discrete the information is.
- the first client sends the at least one updated neural network module and the trained trainer to the server.
- the server updates the stored weight parameters of the neural network module.
- the server updates the weight parameter of the selector.
- steps 1205 to 1210 for a specific implementation manner of steps 1205 to 1210, reference may be made to the description of steps 1005 to 1010 in the corresponding embodiment of FIG. 10 , which will not be repeated here.
- the server may repeatedly perform steps 1201 to 1208 to implement interaction with each of the multiple clients, and then perform steps 1209 and 1210 to complete one iteration of multiple iterations. After performing step 1210, the server re-enters step 1201 to enter the next round of iteration.
- the selection step of the neural network module is performed by the selector, which is beneficial to improve the accuracy of the selection process, and the server performs the selection step, which is beneficial to release the storage space of the client and avoid the need for the computer resources of the client. Occupy, and only send the class center to the server, and try to avoid the leakage of client information.
- different neural networks can be allocated to training data with different data characteristics, that is, personalized matching between neural networks and data characteristics is realized; since the first client is a plurality of clients For each client in the multiple clients, the neural network is allocated and trained according to the data characteristics of the training data set stored by the client, and the same neural network can be trained by using the training data with the same data characteristics,
- the training data with different data characteristics trains different neural networks, which not only realizes the personalized matching between the neural network and the data characteristics, but also helps to improve the accuracy of the neural network after training;
- the neural network not only avoids sending all the extra-neural network modules to the client, so as to reduce the waste of the storage resources of the client, but also avoids the occupation of the computer resources of the client, which is beneficial to improve the user experience.
- FIG. 13 is a schematic flowchart of a data processing method provided by an embodiment of the present application. The method may include:
- the server acquires at least one third neural network corresponding to the data characteristic of the second data set stored by the second client.
- the second client may be any client among multiple clients connected to the server, or may be a client that has newly established a connection relationship with the server.
- the server can receive the second identification information sent by the second client, the second identification The information is the identification information of the third neural network, or the second identification information is the identification information of the neural network module that constructs the third neural network; correspondingly, the server obtains one or more third neural networks pointed to by the second identification information, Alternatively, obtain a neural network module for constructing one or more first neural networks pointed to by the second identification information.
- the server selects at least one third neural network corresponding to the data characteristics of the second data set, then in an implementation manner, after obtaining the identification information of the second client, the server selects the The matching relationship is obtained, and at least one third neural network adapted to the identification information of the second client is acquired.
- the second client performs a clustering operation on the second data set, obtains at least one second data subset, and generates at least one second type center corresponding to the at least one second data subset
- the server receives the After reaching at least one center of the second type, the at least one center of the second type is respectively input into the selector to obtain a neural network module for constructing at least one third neural network.
- the server selects at least one third neural network from at least two second neural networks according to the identification information of the second client and the second adaptation relationship, and the at least one third neural network includes The dataset fits a high neural network.
- the server randomly selects at least one third neural network from the plurality of second neural networks.
- the server sends at least one third neural network to the second client.
- the server sends at least one third neural network, or a neural network module for constructing at least one third neural network, to the second client.
- the second client generates a prediction result of the data to be processed through at least one third neural network.
- the second client may randomly select one third neural network from at least one third neural network, or may select at least one third neural network according to the second data set that is suitable for the second data set The highest matching degree, etc., and then generate a prediction result of the data to be processed through a selected third neural network.
- not only in the training phase can be combined with the data characteristics of the data sets stored in each client to perform training operations on the neural network, that is, not only in the training phase can realize the personalized customization of the neural network, in the inference phase, It can also realize the personalized distribution of the neural network, maintain the coherence of the training stage and the inference stage, and help to improve the accuracy of the inference stage.
- the embodiment of the present application also provides a method for encrypting data on the client before performing the training operation, please refer to the following description.
- Example 1 Gradient-based module and pre-training scheme for packing
- the characteristic of the data of d pA is F A
- the characteristic of the data of d pB is F B
- the client B user data label L ⁇ l 1 , l 2 , l 3 , ..., l P ⁇
- the model parameter owned by client A is W A
- the model parameter owned by client B is W B
- the model gradient corresponding to client A is G A
- the client The model gradient corresponding to end B is G B .
- Step 1 Client A generates semi-fully homomorphically encrypted public key pk A and private key sk A .
- Step 2 Client B generates fully homomorphic encryption public key pk B and private key sk B .
- Step 3 Client A sends its public key pk A to client B, and client B sends its public key pk B to client A.
- Step 4 Client A calculates U A using the model parameters W A it has and the data D A it has. Client A performs a packaging operation on UA to obtain DU A . Client A uses client A's public key pk A to perform homomorphic encryption on DU A , obtains the encrypted [[DU A ]]_pk A , and sends it to client B.
- Du A1 [u A1, u A2, ..., u AL ]
- Du A2 [u AL+1, u AL+2, ..., u AL+L ].
- Homomorphic encryption of DU A with the public key pk A refers to encrypting Du A1, Du A2, ..., Du AP/L with the public key pk A respectively.
- Client B packs U BL to obtain DU BL for client B
- the public key pk B of client B is packaged and encrypted to obtain [[DU BL ]]_pk B and sent to client A.
- Step 6 Client A encrypts its own DU A with client B's public key pk B to obtain [[DU A ]]_pk B and combines it with the DU obtained from client B encrypted with client B's public key BL is the addition of [[DU BL ]]_pk B , and multiplied with the encoded data set D A to obtain the gradient [[G A ]] corresponding to the homomorphically encrypted model. Generate a W A_Noise save of the same size as W A 's dimension * packed length.
- Step 7 Client B encrypts its own DU BL with client A's public key pk A to obtain [[DU BL ]]_pk A , which is encrypted with client A's public key encrypted with client A's public key DU A is [[DU A ]]_pk A added, and multiplied by the encoded DB to obtain the gradient corresponding to the homomorphically encrypted model WB .
- WB_Noise Generate a WB_Noise save of the same size as WB dimension * packed length.
- Step 8 Determine whether the convergence condition is reached, if so, end the training process, otherwise go back to Step 4 to continue execution.
- Clients A and B distribute the computation of U A and U B and one side computes the value of U A + U B.
- Clients A and B compute UA and -UB respectively.
- Clients A and B fill the data with 0 according to the preset fixed digits before the decimal point to obtain: IU A and -IU B
- Client A intercepts data I a
- client B intercepts data I b .
- Client B produces a public-private key pair and sends the public key to Client A.
- Client A generates a random integer RIntX and encrypts it with the public key sent by client B to obtain [[RIntX]].
- Client A sends [[RIntX]]-I a to B.
- the data at the position I b is -1, the position of DRIntX is greater than -2 of I b , and each number in DRIntX is modulo pre-set and the result is sent to A.
- Embodiment 2 Module pre-training scheme based on classification tree
- the module can be implemented in a pre-training manner, and at the same time, joint learning of different features of multiple users can be performed.
- the characteristic of the data of d pA is F A
- the characteristic of the data of d pB is F B
- the inference tree of B output is empty, and the initial segmentation tree of A and B is empty.
- Step 4 Client A will and sent to client B.
- Step 5 Client B decrypts using sk B and get and And send it to client A, client A receives and Subtract the corresponding random number to get and
- Step 6 Client A uses and and and Calculate the gini coefficients under various divisions, and select the cutting scheme corresponding to the smallest gini coefficient and record it as s minA and the Gini coefficient value gini minA , which is used by client B and and and Calculate the gini coefficients under various divisions, and select the cutting scheme corresponding to the smallest gini coefficient and denote it as s minB and the Gini coefficient value gini minB .
- Step 7 Client A sends gini minA to client B, client B compares the size, and returns the comparison result to A.
- the hth node of the B reasoning tree is marked with the number of the side with the smaller gini value.
- Step 8 The party with the smaller gini divides the data according to the corresponding data division scheme and sends the division result to the other party. And write the split strategy to the hth node of the split tree.
- Step 10 B counts which category of leaf nodes has more categories and marks the leaf node as this category.
- Step 1 According to the inference tree, select the processors A and B, and select the processors.
- Step 2 According to the segmentation strategy of the segmentation tree, select the position of the next node.
- Example 3 Modular pre-training scheme based on regression tree
- the module can be implemented in a pre-training manner, and at the same time, joint learning of different features of multiple users can be performed.
- the characteristic of the data of d pA is F A
- the characteristic of the data of d pB is F B
- d p [d pA , d pB ] represents all the characteristic values of the p-th data
- the client B user data label L ⁇ l 1 , l 2 , l 3 , ..., l P ⁇ .
- the inference tree of B output is empty, and the initial segmentation tree of A and B is empty.
- Step 4 Client A will and sent to client B.
- Step 5 Client B decrypts using sk B and get and and send it to client A.
- Client A receives and Subtract the corresponding random number to get and
- Step 6 Client A uses and and and Calculate the mean under various splits.
- Client B uses and and and Compute the mean over various splits:
- Step 7 Client A will and sent to client B.
- Step 8 Client B decrypts using sk B and get and and send it to client A.
- Client A receives and Subtract the corresponding random number to get and
- Step 10 Clients A and B respectively select the division with the smallest variance and record it as s minA and variance values var minA and s minB and variance value var minB .
- Step 11 Client A sends var minA to client B, client B compares the size, and returns the comparison result to A.
- the hth node of the B reasoning tree is marked with the number of the side with the smaller gini value.
- Step 12 The party with the smaller variance divides the data according to the corresponding data segmentation plan, sends the segmentation result to the other party, and writes the segmentation strategy into the hth node of the segmentation tree.
- Step 14 B counts which category of leaf nodes has more than one category and marks the leaf node as this category.
- Step 1 According to the inference tree, select the processors A and B, and select the processors.
- Step 2 According to the segmentation strategy of the segmentation tree, select the position of the next node.
- FIG. 14 is a schematic structural diagram of an apparatus for training a machine learning model provided by an embodiment of the present application.
- the machine learning model training device 1400 is applied to the first client, multiple clients are connected to the server in communication, the server stores multiple modules, the multiple modules are used to build the machine learning model, and the first client is multiple clients
- the training device 1400 of the machine learning model is used to perform multiple rounds of iterations
- the training device 1400 of the machine learning model includes: an acquiring unit 1401, a training unit 1402 and a sending unit 1403, and one of the multiple iterations
- the obtaining unit 1401 is used to obtain at least one first machine learning model, and the at least one first machine learning model is selected according to the data characteristics of the first training data set stored by the training device of the machine learning model
- the training unit 1402 used to perform a training operation on at least one first machine learning model by using the first data set to obtain at least one trained first machine learning model
- the sending unit 1403 is used to include at least one trained first machine learning model including
- the at least one updated module of the server is sent to the server, and the updated module is used for the server to update
- different neural networks can be allocated to training data with different data characteristics, that is, personalized matching between neural networks and data characteristics is realized;
- the neural network is allocated and trained according to the data characteristics of the training data set stored by the client, and the same neural network can be trained by using the training data with the same data characteristics.
- the training data of the characteristics trains different neural networks, which not only realizes the personalized matching between the neural network and the data characteristics, but also helps to improve the accuracy of the neural network after training.
- multiple modules are used to construct at least two second machine learning models, and at least one first machine learning model is selected from the at least two second machine learning models; or, used to construct At least one module of the first machine learning model is selected from a plurality of modules.
- FIG. 15 is a schematic structural diagram of the apparatus for training a machine learning model provided by an embodiment of the present application.
- the machine learning model is a neural network
- the multiple modules stored in the server are neural network modules
- the training device 1400 of the machine learning model stores a first adaptation relationship
- the first adaptation relationship includes a plurality of adaptation values.
- the adaptation value It is used to represent the degree of adaptation between the first data set and the second neural network;
- the apparatus 1400 further includes: a receiving unit 1404 for receiving multiple neural network modules sent by the server;
- For the adaptation relationship at least one first neural network is selected from at least two second neural networks, and the at least one first neural network includes at least one first neural network with a high adaptation value with the first data set.
- the adaptation value between the first data set and a second neural network corresponds to the function value of the first loss function, and the smaller the function value of the first loss function, the first data set and one The greater the adaptation value between the second neural networks; wherein, the first loss function indicates the similarity between the prediction result of the first training data and the correct result of the first data, and the prediction result of the first data is determined by a second The neural network obtains the first data and the correct result of the first data is obtained based on the first data set.
- the adaptation value between the first data set and a second neural network corresponds to the first similarity.
- the similarity between a second neural network and a third neural network is determined by any of the following methods: inputting the same data to a second neural network and a third neural network, respectively, and comparing The similarity between the output data of a second neural network and the output data of the third neural network; or, calculating the similarity between a weight parameter matrix of a second neural network and a weight parameter matrix of the third neural network.
- the machine learning model is a neural network
- the apparatus 1400 further includes: a receiving unit 1404 and an input unit 1405; the receiving unit 1404 is used for receiving the selector sent by the server, and the selector is a A neural network for selecting at least one neural network module matching the data characteristics of the first data set from a plurality of neural network modules; the input unit 1405 is used to input the training data into the selector according to the first data set, and obtain a selection
- the indication information output by the controller, the indication information includes the probability that each neural network module in the plurality of neural network modules is selected, and is used to instruct the neural network module for constructing at least one first neural network; the receiving unit 1404 is also used for receiving from the server. A neural network module for constructing at least one first neural network is received.
- the machine learning model is a neural network
- the multiple modules stored in the server are neural network modules
- the apparatus 1400 further includes: a computing unit 1406 for computing the first data set and at least An adaptation value between each first neural network in a first neural network; wherein the first data set includes a plurality of first training data, and the higher the adaptation value between the first training data and the first neural network , in the process of using the first training data to train the first neural network once, the greater the degree of modification to the weight parameters of the first neural network is.
- the computing unit 1406 is specifically configured to: perform clustering on the first data set to obtain at least two data subsets, where the first data subset is a subset of the first data set set, the first data subset is any one of at least two data subsets; according to the first data subset and the first loss function, the adaptation value between the first data subset and a first neural network is generated, The smaller the function value of the first loss function, the greater the adaptation value between the first data subset and a first neural network; wherein, the first loss function indicates that the prediction result of the first training data and the correctness of the first data are correct.
- the similarity between the results, the prediction result of the first data is obtained by a first neural network, the correct results of the first data and the first data are obtained based on the first data subset, the first data subset and a first neural network
- the adaptation value between is determined as the adaptation value between each data in the first subset of data and a first neural network.
- the machine learning model is a neural network
- the multiple modules stored in the server are neural network modules
- the training unit 1402 is specifically configured to use the first data set to analyze the first neural network according to the second loss function.
- a training operation is performed; wherein the first data set includes a plurality of first training data, the second loss function indicates the similarity between the first prediction result and the correct result of the first training data, and also indicates the first prediction result and the second
- the first prediction result is the prediction result of the first training data output by the first neural network after the first training data is input into the first neural network
- the second prediction result is the first training data
- the fourth neural network is input, the prediction result of the first training data is output by the fourth neural network, and the fourth neural network is the first neural network that has not performed the training operation.
- the first data set includes a plurality of first training data and the correct result of each first training data; the receiving unit 1404 is further configured to receive the selector sent by the server, select
- the device is a neural network for selecting at least one first neural network module matching the data features of the first data set from a plurality of neural network modules; the training unit 1402 is specifically used for: inputting the first training data into the selector, The indication information output by the selector is obtained, the indication information includes the probability that each neural network module in the plurality of neural network modules is selected, and is used to instruct the neural network module for constructing the first neural network; according to the plurality of neural network modules, the indication information and For the first training data, the prediction result of the first training data output by the first neural network is obtained; according to the third loss function, a training operation is performed on the first neural network and the selector, wherein the third loss function indicates the prediction result of the first training data.
- the similarity between the predicted result and the correct result also indicates the discrete degree
- FIG. 16 is a schematic structural diagram of the training apparatus for a machine learning model provided by an embodiment of the present application.
- the machine learning model training device 1600 is applied to the server, the server is connected to multiple clients in communication, the server stores multiple modules, and the multiple modules are used to build the machine learning model, and the first client is any one of the multiple clients.
- a client, the training device 1600 of the machine learning model is used to perform multiple rounds of iterations, and the training device 1600 of the machine learning model includes: an acquiring unit 1601, a sending unit 1602, and an updating unit 1603.
- Obtaining unit 1601 configured to obtain at least one first machine learning model corresponding to a first client, where the first client is one of multiple clients, and the at least one first machine learning model is stored with the first client corresponding to the data characteristics of the first data set of A first machine learning model performs a training operation to obtain at least one trained first machine learning model; an update unit 1603 is configured to receive from the first client at least one updated first machine learning model included in the at least one trained first machine learning model the neural network module, and update the stored weight parameters of the neural network module according to at least one updated neural network module.
- different neural networks can be allocated to training data with different data characteristics, that is, personalized matching between neural networks and data characteristics is realized; since the first client is a plurality of clients For each client in the multiple clients, the neural network is allocated and trained according to the data characteristics of the training data set stored by the client, and the same neural network can be trained by using the training data with the same data characteristics,
- the training data with different data characteristics trains different neural networks, which not only realizes the personalized matching between the neural network and the data characteristics, but also helps to improve the accuracy of the neural network after training;
- the neural network not only avoids sending all the extra-neural network modules to the client, so as to reduce the waste of the storage resources of the client, but also avoids the occupation of the computer resources of the client, which is beneficial to improve the user experience.
- multiple modules are used to construct at least two second machine learning models, and at least one first machine learning model is selected from the at least two second machine learning models; or, used to construct At least one module of the first machine learning model is selected from a plurality of modules.
- FIG. 17 is a schematic structural diagram of a training apparatus for a machine learning model provided by an embodiment of the present application.
- the machine learning model is a neural network
- the multiple modules stored in the training device 1600 of the machine learning model are neural network modules
- the training device 1600 of the machine learning model stores a second adaptation relationship
- the second adaptation relationship includes a plurality of An adaptation value, where the adaptation value is used to indicate the degree of adaptation between the training data stored in the client and the second neural network.
- the apparatus 1600 further includes: a receiving unit 1604, configured to receive the first data sent by the first client Set the adaptation values between the at least one second neural network and update the second adaptation relationship; the acquiring unit 1601 is specifically configured to select at least one first neural network from multiple second neural networks according to the second adaptation relationship Neural networks, the at least one first neural network includes a neural network with a high fitness value to the first data set.
- the machine learning model is a neural network
- the multiple modules stored in the machine learning model training apparatus 1600 are neural network modules
- the apparatus 1600 further includes: a receiving unit 1604 for receiving The first identification information sent by the first client, the first identification information is the identification information of the first neural network, or the first identification information is the identification information of the neural network module that constructs the first neural network; the sending unit 1602, specifically using It is to send the first neural network pointed to by the first identification information to the first client, or send the neural network module that constructs the first neural network pointed to by the first identification information to the first client.
- the machine learning model is a neural network
- the multiple modules stored in the machine learning model training device 1600 are neural network modules
- the machine learning model training device 1600 is further configured with a selector
- the apparatus further includes: a receiving unit 1604, configured to receive at least one class center sent by the first client, after performing a clustering operation on the first data set, obtain at least one data subset, a class center in the at least one class center is the class center of a data subset in the at least one data subset;
- the obtaining unit 1601 is specifically configured to input the class center into the selector respectively, obtain the indication information output by the selector, and determine to construct at least one first neural network according to the indication information
- the neural network module of the network, the indication information includes the probability that each neural network module in the plurality of neural network modules is selected;
- the sending unit 1602 is specifically configured to send the neural network module for constructing at least one first neural network to the first client.
- the machine learning model is a neural network
- the multiple modules stored in the training device 1600 of the machine learning model are neural network modules
- a neural network is divided into at least two sub-modules.
- the neural network modules stored by the training device 1600 of the learning model are divided into at least two groups corresponding to at least two sub-modules.
- the apparatus 1600 further includes: a calculation unit 1605, configured to calculate the similarity between different neural network modules in the at least two neural network modules included in the same group, and set the similarity to be greater than a preset threshold The two neural network modules are merged.
- different neural network modules include a second neural network module and a first neural network module
- the similarity between the second neural network module and the first neural network module is determined by any one of the following methods: Input the same data to the second neural network module and the first neural network module respectively, and compare the similarity between the output data of the second neural network module and the output data of the first neural network module; or, calculate the second neural network The similarity between the weight parameter matrix of the module and the weight parameter matrix of the first neural network module.
- FIG. 18 is a schematic structural diagram of a server provided by an embodiment of the present application.
- the training apparatus 1400 of the machine learning model described in the corresponding embodiments of FIG. 14 and FIG. 15 may be deployed on the server 1800 to implement the functions of the first client in the corresponding embodiments of FIG. 4 to FIG. 13 .
- the server 1800 may be deployed with the training apparatus 1600 for the machine learning model described in the embodiments corresponding to FIG. 16 and FIG. 17 , for implementing FIG. 4 to FIG. 13 It corresponds to the function of the server in the embodiment.
- the server 1800 may vary greatly due to different configurations or performance, and may include one or more central processing units (CPU) 1822 (eg, one or more processors) and memory 1832, a or one or more storage media 1830 (eg, one or more mass storage devices) storing applications 1842 or data 1844.
- the memory 1832 and the storage medium 1830 may be short-term storage or persistent storage.
- the program stored in the storage medium 1830 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the server.
- the central processing unit 1822 may be configured to communicate with the storage medium 1830 to execute a series of instruction operations in the storage medium 1830 on the server 1800 .
- Server 1800 may also include one or more power supplies 1826, one or more wired or wireless network interfaces 1850, one or more input and output interfaces 1858, and/or, one or more operating systems 1841, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and many more.
- operating systems 1841 such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and many more.
- the central processing unit 1822 is used to execute the training method of the machine learning model executed by the first client in the embodiments corresponding to FIG. 4 to FIG. 13 , specifically, the training method of the machine learning model
- the training includes multiple rounds of iterations. In one iteration of the multiple rounds of iterations, the central processing unit 1822 is specifically used for:
- At least one first machine learning model where the at least one first machine learning model is selected according to the data characteristics of the first data set stored by the first client; perform training on the at least one first machine learning model by using the first data set operation to obtain at least one trained first machine learning model; at least one updated module included in the at least one trained first machine learning model is sent to the server, and the updated module is used for the server to update the stored The weight parameter of the module.
- the central processing unit 1822 is further configured to perform other steps performed by the first client in FIG. 4 to FIG. 13 , and the specific implementation manner of the central processing unit 1422 performing the steps performed by the first client in the corresponding embodiments of FIGS. 4 to 13
- the central processing unit 1422 performing the steps performed by the first client in the corresponding embodiments of FIGS. 4 to 13
- the central processing unit 1822 is used to execute the training method of the machine learning model executed by the server in the corresponding embodiments of FIG. 4 to FIG. 13 .
- the training of the machine learning model includes: Multiple iterations, in one iteration of the multiple iterations, the central processing unit 1822 is specifically used for:
- At least one first machine learning model corresponding to the first client where the first client is one of the multiple clients, and the at least one first machine learning model is associated with the first data set stored by the first client. corresponding to the data characteristics; at least one first machine learning model is sent to the first client, and the at least one first machine learning model instructs the first client to use the first data set to perform a training operation on the at least one first machine learning model, and obtain at least one A trained first machine learning model; at least one updated neural network module included in the at least one trained first machine learning model is received from the first client, and the stored neural network module is updated according to the at least one updated neural network module. Weight parameters for the neural network module.
- the central processing unit 1822 is also configured to perform other steps performed by the server in FIG. 4 to FIG. 13 , for the specific implementation manner of the central processing unit 1422 performing the steps performed by the server in the corresponding embodiments of FIG. 4 to FIG. 13 and the beneficial effects brought about , the descriptions in the respective method embodiments corresponding to FIG. 4 to FIG. 13 may be referred to, and details are not repeated here.
- FIG. 19 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.
- the terminal device 1900 can be deployed with the training apparatus 1400 of the machine learning model described in the corresponding embodiments of FIG. 14 and FIG. 15 , for implementing FIG. 4 to FIG. It corresponds to the function of the first client in the embodiment.
- the terminal device 1900 includes: a receiver 1901, a transmitter 1902, a processor 1903 and a memory 1904 (wherein the number of processors 1903 in the terminal device 1900 may be one or more, and one processor is taken as an example in FIG.
- the processor 1903 may include an application processor 19031 and a communication processor 19032.
- the receiver 1901, the transmitter 1902, the processor 1903, and the memory 1904 may be connected by a bus or otherwise.
- Memory 1904 may include read-only memory and random access memory, and provides instructions and data to processor 1903 .
- a portion of memory 1904 may also include non-volatile random access memory (NVRAM).
- NVRAM non-volatile random access memory
- the memory 1904 stores processors and operating instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, wherein the operating instructions may include various operating instructions for implementing various operations.
- the processor 1903 controls the operation of the terminal device.
- various components of the terminal device are coupled together through a bus system, where the bus system may include a power bus, a control bus, a status signal bus, and the like in addition to a data bus.
- the various buses are referred to as bus systems in the figures.
- the methods disclosed in the above embodiments of the present application may be applied to the processor 1903 or implemented by the processor 1903 .
- the processor 1903 may be an integrated circuit chip with signal processing capability. In the implementation process, each step of the above-mentioned method can be completed by an integrated logic circuit of hardware in the processor 1903 or an instruction in the form of software.
- the above-mentioned processor 1903 can be a general-purpose processor, a digital signal processor (digital signal processing, DSP), a microprocessor or a microcontroller, and may further include an application specific integrated circuit (ASIC), a field programmable Field-programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
- DSP digital signal processing
- ASIC application specific integrated circuit
- FPGA field programmable Field-programmable gate array
- the processor 1903 may implement or execute the methods, steps, and logical block diagrams disclosed in the embodiments of this application.
- a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
- the steps of the methods disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
- the software module may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art.
- the storage medium is located in the memory 1904, and the processor 1903 reads the information in the memory 1904, and completes the steps of the above method in combination with its hardware.
- the receiver 1901 can be used to receive input digital or character information, and generate signal input related to the related settings and function control of the terminal device.
- the transmitter 1902 can be used to output digital or character information through the first interface; the transmitter 1902 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group; the transmitter 1902 can also include display devices such as a display screen .
- the application processor 19031 is configured to execute the functions of the first client in the embodiments corresponding to FIG. 4 to FIG. 13 . It should be noted that, for the specific implementation manner of the application processor 19031 performing the functions of the first client in the embodiments corresponding to FIGS. 4 to 13 and the beneficial effects brought about, reference may be made to the respective method embodiments corresponding to FIGS. 4 to 13 . The descriptions in , will not be repeated here.
- Embodiments of the present application further provide a computer-readable storage medium, where a program is stored in the computer-readable storage medium, and when the computer-readable storage medium runs on a computer, the computer causes the computer to execute the operations described in the foregoing embodiments shown in FIG. 4 to FIG. 13 .
- the embodiments of the present application also provide a computer program product, which, when running on a computer, causes the computer to perform the steps performed by the first client in the methods described in the foregoing embodiments shown in FIG. 4 to FIG. 13 , or , so that the computer executes the steps executed by the server in the method described in the embodiments shown in FIG. 4 to FIG. 13 .
- An embodiment of the present application further provides a circuit system, where the circuit system includes a processing circuit, and the processing circuit is configured to perform the steps performed by the first client in the methods described in the foregoing embodiments shown in FIG. 4 to FIG. 13 . , or, the processing circuit is configured to perform the steps performed by the server in the methods described in the embodiments shown in the foregoing FIG. 4 to FIG. 13 .
- the training device, client, and server of the machine learning model provided in the embodiments of the present application may specifically be chips.
- the chip includes: a processing unit and a communication unit.
- the processing unit may be, for example, a processor, and the communication unit may be, for example, an input/communication unit. Output interface, pin or circuit, etc.
- the processing unit can execute the computer-executable instructions stored in the storage unit, so that the chip executes the neural network training method described in the embodiments shown in FIG. 4 to FIG. 13 .
- the storage unit is a storage unit in the chip, such as a register, a cache, etc.
- the storage unit may also be a storage unit located outside the chip in the wireless access device, such as only Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.
- ROM Read-only memory
- RAM random access memory
- FIG. 20 is a schematic structural diagram of a chip provided by an embodiment of the application.
- the chip may be represented as a neural network processor NPU 200, and the NPU 200 is mounted as a co-processor to the main CPU (Host CPU), tasks are allocated by the Host CPU.
- the core part of the NPU is the arithmetic circuit 2003, which is controlled by the controller 2004 to extract the matrix data in the memory and perform multiplication operations.
- the arithmetic circuit 2003 includes multiple processing units (Process Engine, PE). In some implementations, the arithmetic circuit 2003 is a two-dimensional systolic array. The arithmetic circuit 2003 may also be a one-dimensional systolic array or other electronic circuitry capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 2003 is a general-purpose matrix processor.
- PE Processing Unit
- the arithmetic circuit 2003 is a two-dimensional systolic array.
- the arithmetic circuit 2003 may also be a one-dimensional systolic array or other electronic circuitry capable of performing mathematical operations such as multiplication and addition.
- the arithmetic circuit 2003 is a general-purpose matrix processor.
- the arithmetic circuit fetches the data corresponding to the matrix B from the weight memory 2002 and buffers it on each PE in the arithmetic circuit.
- the arithmetic circuit fetches the data of matrix A and matrix B from the input memory 2001 to perform matrix operation, and stores the partial result or final result of the matrix in an accumulator 2008 .
- Unified memory 2006 is used to store input data and output data.
- the weight data directly passes through the storage unit access controller (Direct Memory Access Controller, DMAC) 2005, and the DMAC is transferred to the weight memory 2002.
- Input data is also transferred to unified memory 2006 via the DMAC.
- DMAC Direct Memory Access Controller
- the BIU is the Bus Interface Unit, that is, the bus interface unit 2010, which is used for the interaction between the AXI bus and the DMAC and the instruction fetch buffer (Instruction Fetch Buffer, IFB) 2009.
- IFB Instruction Fetch Buffer
- the bus interface unit 2010 (Bus Interface Unit, BIU for short) is used for the instruction fetch memory 2009 to obtain instructions from the external memory, and also for the storage unit access controller 2005 to obtain the original data of the input matrix A or the weight matrix B from the external memory.
- the DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 2006 , the weight data to the weight memory 2002 , or the input data to the input memory 2001 .
- the vector calculation unit 2007 includes a plurality of operation processing units, and further processes the output of the operation circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison and so on, if necessary. It is mainly used for non-convolutional/fully connected layer network computation in neural networks, such as Batch Normalization, pixel-level summation, and upsampling of feature planes.
- the vector computation unit 2007 can store the processed output vectors to the unified memory 2006 .
- the vector calculation unit 2007 may apply a linear function and/or a nonlinear function to the output of the operation circuit 2003, such as linear interpolation of the feature plane extracted by the convolutional layer, such as a vector of accumulated values, to generate activation values.
- the vector computation unit 2007 generates normalized values, pixel-level summed values, or both.
- the vector of processed outputs can be used as activation input to the arithmetic circuit 2003, eg, for use in subsequent layers in a neural network.
- the instruction fetch buffer 2009 connected to the controller 2004 is used to store the instructions used by the controller 2004; the unified memory 2006, the input memory 2001, the weight memory 2002 and the instruction fetch memory 2009 are all On-Chip memories. External memory is private to the NPU hardware architecture.
- each layer in the recurrent neural network can be performed by the operation circuit 2003 or the vector calculation unit 2007 .
- the processor mentioned in any one of the above may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the program of the method in the first aspect.
- the device embodiments described above are only schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be A physical unit, which can be located in one place or distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
- the connection relationship between the modules indicates that there is a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.
- U disk mobile hard disk
- ROM read-only memory
- RAM magnetic disk or optical disk
- a computer device which may be a personal computer, server, or network device, etc.
- the computer program product includes one or more computer instructions.
- the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
- the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server, or data center is by wire (eg, coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.).
- wire eg, coaxial cable, fiber optic, digital subscriber line (DSL)
- wireless eg, infrared, wireless, microwave, etc.
- the computer-readable storage medium may be any available medium that can be stored by a computer, or a data storage device such as a server, data center, etc., which includes one or more available media integrated.
- the usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)), and the like.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Biodiversity & Conservation Biology (AREA)
- Feedback Control In General (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
Abstract
一种机器学习模型的训练的方法以及相关设备,涉及人工智能领域。该方法应用于第一客户端,多个客户端与服务器通信连接,服务器中存储有多个模块,多个模块用于构建至少两个机器学习模型,方法包括:获取第一机器学习模型,至少一个第一机器学习模型为根据第一客户端存储的第一训练数据集合的数据特性选取出来的;利用第一数据集合对至少一个第一机器学习模型执行训练操作,得到至少一个训练后的第一机器学习模型;将至少一个更新后的模块发送给服务器,更新后的模块用于供服务器更新存储的模块的权重参数,对不同数据特性的训练数据分配不同的神经网络,实现了神经网络与数据特性之间的个性化匹配。
Description
本申请要求于2020年9月18日提交中国专利局、申请号为202010989062.5、发明名称为“一种机器学习模型的训练的方法以及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及人工智能领域,尤其涉及一种机器学习模型的训练的方法以及相关设备。
人工智能(Artificial Intelligence,AI)是利用计算机或者计算机控制的机器模拟、延伸和扩展人的智能。人工智能包括研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。目前,随着用户对个人隐私数据的保护意愿日益提升,数据拥有者之间的用户数据无法互通,形成了大大小小的“数据孤岛”。“数据孤岛”对基于海量数据的人工智能提出了新的挑战。
针对“数据孤岛”的存在,联邦学习(federated learning)被提出,也即不同客户端利用本地存储的训练数据对同一神经网络进行训练,并将训练后的神经网络发送给服务器,由服务器对参数的更新情况进行汇聚。但由于不同客户端中存储的训练数据的数据特性不同,也即不同客户端的优化目标不一致,并且每一轮训练所选择的客户端也不完全相同,导致每一轮训练的优化目标也可能不一致,从而导致神经网络的训练过程容易产生震荡,导致训练效果不佳。
发明内容
本申请实施例提供了一种机器学习模型的训练的方法以及相关设备,对不同数据特性的训练数据分配不同的神经网络,实现了神经网络与数据特性之间的个性化匹配;每个客户端均根据客户端存储的训练数据集合的数据特性分配并训练神经网络,能够利用相同数据特性的训练数据训练相同的神经网络,有利于提高训练后神经网络的准确率。
为解决上述技术问题,本申请实施例提供以下技术方案:
第一方面,本申请实施例提供一种机器学习模型的训练方法,可用于人工智能领域中。方法应用于第一客户端,多个客户端与服务器通信连接,服务器中存储有多个模块,多个模块用于构建机器学习模型,第一客户端为多个客户端中的任一客户端;机器学习模型具体可以表现为神经网络、线性模型或其他类型的机器学习模型,对应的,组成机器学习模型的多个模块具体可以表现为神经网络模块、线性模型模块或组成其他类型的机器学习模型的模块。机器学习模型的训练包括多轮迭代,多轮迭代中的一轮迭代包括:第一客户端获取至少一个第一机器学习模型,至少一个第一机器学习模型为根据第一客户端存储的第一数据集合的数据特性选取出来的;具体的,第一客户端可以接收服务器发送的多个模块,并从至少两个第二机器学习模型中选取该至少一个第一机器学习模型,或者,第一客户端可以接收服务器发送的该至少一个第一机器学习模型。第一客户端利用第一数据集合对至 少一个第一机器学习模型执行训练操作,得到至少一个训练后的第一机器学习模型;第一客户端将至少一个训练后的第一机器学习模型包括的至少一个更新后的模块发送给服务器,更新后的模块用于供服务器更新存储的模块的权重参数。
本实现方式中,服务器中存储有多个神经网络模块,多个神经网络模块能够组成至少两个不同的第二神经网络,在针对多个客户端中的一个第一客户端时,会选择与第一客户端存储的第一数据集合的数据特性匹配的至少一个第一神经网络,在利用第一客户端的训练数据对该至少一个第一神经网络进行训练后,再由服务器对参数的更新情况进行汇聚;通过前述方式,能够对不同数据特性的训练数据分配不同的神经网络,也即实现了神经网络与数据特性之间的个性化匹配;此外,由于第一客户端为多个客户端中的任一客户端,对多个客户端中的每个客户端均根据客户端存储的训练数据集合的数据特性分配并训练神经网络,能够利用相同数据特性的训练数据训练相同的神经网络,不同数据特性的训练数据训练不同的神经网络,从而不仅实现了神经网络与数据特性之间的个性化匹配,而且有利于提高训练后神经网络的准确率。
在第一方面的一种可能实现方式中,多个模块用于构建至少两个第二机器学习模型,至少一个第一机器学习模型为从至少两个第二机器学习模型中选取出来的;或者,用于构建至少一个第一机器学习模型的模块为从多个模块中选取出来的。本实现方式中,提供了第二机器学习模型的两种选取方式,提高了本方案的实现灵活性。
在第一方面的一种可能实现方式中,机器学习模型为神经网络,服务器中存储的多个模块为神经网络模块,第一客户端上存储有第一适配关系,第一适配关系包括多个适配值,适配值用于表示第一数据集合与第二神经网络之间的适配程度。第一客户端获取至少一个第二机器学习模块之前,方法还包括:第一客户端接收服务器发送的多个神经网络模块。第一客户端获取至少一个机器学习模型包括:第一客户端根据第一适配关系,从至少两个第二神经网络中选取至少一个第一神经网络,该至少一个第一神经网络包括与第一数据集合的适配值高的第一神经网络。其中,适配值高的至少一个第一神经网络可以为适配值最高的N个第一神经网络,N的取值为大于或等于1的整数,例如N的取值可以为1、2、3、4、5、6或其他数值,此处不做限定。或者,适配值高的至少一个第一神经网络可以为大于第四阈值的至少一个第一神经网络,第四阈值的取值可以结合适配值的生成方式、取值范围等因素确定。可选地,该至少一个第一神经网络中还包括第一客户端从至少两个第二神经网络中随机选取的神经网络。
本实现方式中,在第一客户端上预先配置有第一适配关系,进而根据第一适配关系,从至少两个第二神经网络中选取与第一数据集合的适配值高的至少一个第一神经网络,以保证了选取的为与第一数据集合的数据特性适配的神经网络,保证了实现不同客户端的神经网络的个性化定制;此外,选取与第一数据集合的数据特性适配,有利于提高训练后的神经网络的准确率。
在第一方面的一种可能实现方式中,由于第一适配关系中可能存在空值,则第一客户端可以根据第一适配关系得到第一适配矩阵,第一适配矩阵中的每个元素代表一个适配值,在第一适配关系中存在空值时,可以通过矩阵分解的方式对第一适配关系进行补全,补全 后的第一适配关系中不再包括空值,从而可以根据补全后的第一适配关系选取与第一数据集合的适配值高的至少一个第一神经网络。
在第一方面的一种可能实现方式中,第一数据集合与一个第二神经网络之间的适配值与第一损失函数的函数值对应,第一损失函数的函数值越小,第一数据集合与一个第二神经网络之间的适配值越大。其中,第一损失函数指示第一数据的预测结果与第一数据的正确结果之间的相似度,第一数据的预测结果通过一个第二神经网络得到,第一数据和第一数据的正确结果基于第一数据集合得到。第一数据可以为第一数据集合中的任一数据,也可以为对第一数据集合进行聚类操作之后得到至少两个数据子集合,第一数据是前述至少两个数据子集合中任一数据子集合的类中心。进一步地,在第一数据为第一数据集合中任一数据的情况下,由于第一数据集合中的数据会用来对第一神经网络执行训练操作(也即第一数据集合中会包括第一训练数据),还可能用来测试训练后的第一神经网络的正确率(也即第一数据集合中可能包括测试数据),还可能用来验证第一神经网络中超参数的正确性(也即第一数据集合中还可能包括验证数据),则第一数据可以是用来训练的数据,也可以是用来测试的数据,还可以是用来验证的数据。
本实现方式中,通过损失函数来计算第一数据集合与一个第一神经网络之间的适配值,方案简单,易于实现,且准确度高。
在第一方面的一种可能实现方式中,第一数据集合与一个第二神经网络之间的适配值与第一相似度对应,第一相似度越大,第一数据集合与一个第二神经网络之间的适配值越大。其中,第一相似度指的是一个第二神经网络和第三神经网络之间的相似度,第三神经网络为上一轮迭代中输出预测结果的准确率最高的神经网络。
本实现方式中,由于第三神经网络为上一轮迭代中输出预测结果的准确率最高的神经网络,且第三神经网络为已经利用第一数据集合训练过的神经网络,也即第三神经网络与第一数据集合的适配程度较高,若一个第一神经网络与该第三神经网络的相似度高,则证明该一个第一神经网络与第一数据集合之间的适配程度高,则适配值就会大;提供了适配值计算的另一种实现方案,提高了本方案的实现灵活性。
在第一方面的一种可能实现方式中,一个第二神经网络和第三神经网络之间的相似度通过以下任一种方式确定:第一客户端将相同数据分别输入至一个第二神经网络和第三神经网络,并对比一个第二神经网络的输出数据与第三神经网络的输出数据之间的相似度。或者,第一客户端计算一个第二神经网络的权重参数矩阵和第三神经网络的权重参数矩阵之间的相似度。其中,两者之间的相似度可以通过计算两者之间的欧式距离、马氏距离、余弦距离、交叉熵或其他方式获得。
本实现方式中,提供了一个第一神经网络和第三神经网络之间的相似度的两种计算方式,提高了本方案的实现灵活性。
在第一方面的一种可能实现方式中,一个第一神经网络的输出数据与第三神经网络的输出数据之间的相似度指的可以为整个第一神经网络的输出数据与整个第三神经网络的输出数据之间的第一相似度。或者,一个第一神经网络的输出数据与第三神经网络的输出数据之间的相似度指的还可以为第一神经网络中各个模块的输出数据与第三神经网络中各个 模块的输出数据之间的相似度,计算各个模块的输出数据之间相似度的乘积,得到整个第一神经网络的输出数据与整个第三神经网络的输出数据之间的相似度。
在第一方面的一种可能实现方式中,机器学习模型为神经网络,方法还包括:第一客户端接收服务器发送的选择器,选择器为用于从多个神经网络模块中选取与第一数据集合的数据特征匹配的至少一个神经网络模块的神经网络。第一客户端根据第一数据集合,将训练数据输入至选择器,得到选择器输出的指示信息。其中,指示信息包括多个神经网络模块中每个神经网络模块被选中的概率,用于指示构建至少一个第一神经网络的神经网络模块;进一步地,若该多个神经网络模块中包括Z个神经网络模块,则该指示信息具体可以表现为包括Z个元素的向量,Z个元素中每个元素指示一个神经网络模块被选中的概率。第一客户端从服务端接收用于构建至少一个第一神经网络的神经网络模块。
本实现方式中,根据第一数据集合,将训练数据输入至选择器,得到选择器输出的指示信息,并根据该指示信息选取用于构建第一神经网络的神经网络模块,选择器为用于从多个神经网络模块中选取与第一数据集合的数据特征匹配的神经网络模块的神经网络,提供了选取构建第一神经网络的神经网络模块的又一种实现方式,提高了本方案的实现灵活性;且通过神经网络来选取,有利于提高神经网络模块的选取过程的准确率。
在第一方面的一种可能实现方式中,针对将训练数据输入选择器的过程。第一客户端可以将第一数据集合中的每个第一训练数据均分别输入选择器一次,以得到与每个第一训练数据对应的指示信息。或者,第一客户端也可以对第一数据集合执行聚类操作,并分别将聚类后的几个类中心(训练数据的一种示例)输入选择器,以得到与每个类中心对应的指示信息。或者,第一客户端也可以对第一数据集合执行聚类操作,并分别从聚类后的几个数据子集合中抽样几个第一训练数据,将抽样得到的第一训练数据(训练数据的一种示例)分别输入选择器,以得到与每个抽样得到的第一训练数据对应的指示信息。
在第一方面的一种可能实现方式中,针对根据指示信息确定构建至少一个第一神经网络的神经网络模块的过程。第一客户端可以初始化一个用于指示每个神经网络模块被选中次数的数组,初始化值为0,该数组也可以为表格、矩阵或其他形式。第一客户端在得到至少一个指示信息之后,针对每一个指示信息,对于选中概率大于第五阈值的神经网络模块,则数组中与该神经网络模块对应的次数加一,在遍历所有指示信息之后,第一客户端根据该数组统计被选中次数大于第六阈值的至少一个神经网络模块,并将前述至少一个神经网络模块确定为用于构建至少一个第一神经网络的神经网络模块。或者,第一客户端在得到多个指示信息之后,还可以对多个指示信息求平均值,得到一个包括Z个元素的向量,向量中每个元素指示一个神经网络模块被选中的概率,进而从Z个元素中获取平均值最大的H个元素,并将前述H个元素指向的H个神经网络模块确定为用于构建至少一个第一神经网络的神经网络模块,Z为大于1的整数,H为大于或等于1的整数。
在第一方面的一种可能实现方式中,机器学习模型为神经网络,服务器中存储的多个模块为神经网络模块,第一客户端获取至少一个第一机器学习模型之后,方法还包括:第一客户端计算第一数据集合与至少一个第一神经网络中每个第一神经网络之间的适配值。其中,第一数据集合包括多个第一训练数据,第一训练数据与第一神经网络之间的适配值 越高,在利用第一训练数据对第一神经网络进行一次训练的过程中,对第一神经网络的权重参数的修改程度越大。进一步地,调整对第一神经网络的权重参数的修改成的方式包括:调整学习率、调整惩罚项的系数或其他方式。学习率越大,在一次训练过程中对第一神经网络的权重参数的修改程度越大,学习率越小,再一次训练过程中对第一神经网络的修改程度越小;也即第一训练数据与第一神经网络之间的适配值越高,在利用该第一训练数据对第一神经网络进行一次训练的过程中的学习率越大。惩罚项的系数越小,在一次训练过程中对第一神经网络的修改程度越大;惩罚项的系数越大,在一次训练过程中对第一神经网络的修改程度越小;也即第一训练数据与第一神经网络之间的适配值越高,在利用该第一训练数据对第一神经网络进行一次训练的过程中的惩罚项系数越小。
本实现方式中,由于同一客户端中不同的训练数据与一个第一神经网络之间的适配程度有高有低,所有训练数据均以固定的能力修改一个第一神经网络的权重参数这是不合理的,一个第一训练数据与第一神经网络之间的适配值越高,则证明该第一神经网络越应该处理该一个第一训练数据,在利用该第一训练数据对第一神经网络进行一次训练的过程中,对第一神经网络的权重参数的修改程度越大,有利于提高一个第一神经网络的训练效率。
在第一方面的一种可能实现方式中,第一客户端计算第一数据集合与至少一个第一神经网络中每个第一神经网络之间的适配值,包括:第一客户端对第一数据集合进行聚类,得到至少两个数据子集合,第一数据子集合为第一数据集合的子集,第一数据子集合为至少两个数据子集合中的任一个;第一客户端根据第一数据子集合和第一损失函数,生成第一数据子集合与一个第一神经网络之间的适配值,第一损失函数的函数值越小,第一数据子集合与一个第一神经网络之间的适配值越大。其中,第一损失函数指示第一数据的预测结果与第一数据的正确结果之间的相似度。第一数据的预测结果通过一个第一神经网络得到,第一数据指的是第一数据子集合中的任一个数据,或者,第一数据指的是第一数据子集合的类中心。第一数据和第一数据的正确结果基于第一数据子集合得到,第一数据子集合与一个第一神经网络之间的适配值被确定为第一数据子集合中每个数据与一个第一神经网络之间的适配值。
本实现方式中,对第一数据集合进行聚类,得到至少两个数据子集合,同一数据子集合中不同训练数据与一个第一神经网络之间的适配值相同,也即同一类的训练数据对一个第一神经网络的修改能力相同,以满足同一客户端中存在至少两个不同数据特性的数据子集合的情况,以进一步提高神经网络的个性化定制能力,有利于提高训练后的神经网络的准确率。
在第一方面的一种可能实现方式中,机器学习模型为神经网络,服务器中存储的多个模块为神经网络模块。第一客户端利用第一数据集合对至少一个第一机器学习模型执行训练操作,包括:第一客户端根据第二损失函数,利用第一数据集合对第一神经网络执行训练操作。其中,第二损失函数包括第一项和第二项,第一项指示第一预测结果与第一训练数据的正确结果之间的相似度,第二项指示第一预测结果与第二预测结果之间的相似度,第二项可以称为惩罚项或约束项。进一步地,第一预测结果为将第一训练数据输入第一神经网络后,由第一神经网络输出的第一训练数据的预测结果;第二预测结果为将第一训练 数据输入第四神经网络后,由第四神经网络输出的第一训练数据的预测结果。第四神经网络为未执行过训练操作的第一神经网络,也即第四损失函数和第二损失函数的初始状态一致,但在对第二损失函数进行训练的过程中,第四损失函数的权重参数一直不会更新。
本实现方式中,由于第一客户端上的第一数据集合不一定与第一神经网络匹配,在利用第一数据集合对第一神经网络进行训练的过程中,第二损失函数还会指示第一预测结果与第二预测结果之间的相似度,也即避免第一神经网络在训练过程中被过多改动。
在第一方面的一种可能实现方式中,第一数据集合包括多个第一训练数据和每个第一训练数据的正确结果,方法还包括:第一客户端接收服务器发送的选择器,选择器为用于从多个神经网络模块中选取与第一数据集合的数据特征匹配的至少一个第一神经网络模块的神经网络。第一客户端利用第一数据集合对至少一个第一机器学习模型执行训练操作,包括:第一客户端将第一训练数据输入选择器,得到选择器输出的指示信息,指示信息包括多个神经网络模块中每个神经网络模块被选中的概率,用于指示构建第一神经网络的神经网络模块;根据多个神经网络模块,指示信息和第一训练数据,得到第一神经网络输出的第一训练数据的预测结果;根据第三损失函数,对第一神经网络和选择器执行训练操作,其中,第三损失函数指示第一训练数据的预测结果与正确结果之间的相似度,还指示该指示信息的离散程度。方法还包括:第一客户端向服务器发送训练后的选择器。
本实现方式中,在训练构建第一神经网络的神经网络模块的同时,训练选择器,节约了计算机资源;用选择器处理需要处理的数据来训练选择器,有利于提高选择器输出的指示信息的准确率。
第二方面,本申请实施例提供一种机器学习模型的训练方法,可用于人工智能领域中。方法应用于服务器,服务器与多个客户端通信连接,服务器中存储有多个模块,多个模块用于构建机器学习模型,第一客户端为多个客户端中的任一客户端,机器学习模型的训练包括多轮迭代,多轮迭代中的一轮迭代包括:服务器获取与第一客户端对应的至少一个第一机器学习模型,第一客户端为多个客户端中的一个客户端,至少一个第一机器学习模型与第一客户端存储的第一数据集合的数据特性对应。服务器将至少一个第一机器学习模型发送给第一客户端,至少一个第一机器学习模型指示第一客户端利用第一数据集合对至少一个第一机器学习模型执行训练操作,得到至少一个训练后的第一机器学习模型。服务器从第一客户端接收至少一个训练后的第一机器学习模型包括的至少一个更新后的神经网络模块,并根据至少一个更新后的神经网络模块更新存储的神经网络模块的权重参数。
本实现方式中,能够对不同数据特性的训练数据分配不同的神经网络,也即实现了神经网络与数据特性之间的个性化匹配;由于第一客户端为多个客户端中的任一客户端,对多个客户端中的每个客户端均根据客户端存储的训练数据集合的数据特性分配并训练神经网络,能够利用相同数据特性的训练数据训练相同的神经网络,不同数据特性的训练数据训练不同的神经网络,从而不仅实现了神经网络与数据特性之间的个性化匹配,而且有利于提高训练后神经网络的准确率;由服务器选择与各个客户端适配的神经网络,既避免了将所有神经外网络模块发送给客户端,以减少对客户端存储资源的浪费;且避免了对客户端计算机资源的占用,有利于提高用户体验。
在第二方面的一种可能实现方式中,多个模块用于构建至少两个第二机器学习模型,至少一个第一机器学习模型为从至少两个第二机器学习模型中选取出来的;或者,用于构建至少一个第一机器学习模型的模块为从多个模块中选取出来的。
在第二方面的一种可能实现方式中,服务器根据至少一个更新后的神经网络模块更新存储的神经网络模块的权重参数,可以包括:由于不同客户端发送的可以存在相同的神经网络模块,则服务器将不同客户端发送的相同的神经网络模块的权重参数进行加权平均,作为服务器中该神经网络模块的权重参数。对于不同客户端中没有重合的神经网络模块,则直接将客户端发送的神经网络模块的参数作为服务器中该神经网络模块的权重参数。其中,相同的神经网络模块指的是具体的神经网络相同,且位于相同的分组中。
在第二方面的一种可能实现方式中,服务器根据至少一个更新后的神经网络模块更新存储的神经网络模块的权重参数,可以包括:若服务器中存在训练数据,则还可以使用模型蒸馏的方法,利用多个客户端发送的多个更新后的神经网络模块,来更新服务器存储的神经网络模块的权重参数。也即使用服务器中存储的训练数据重新训练服务器中存储的多个神经网络模块,训练的目的为拉近服务器中存储的神经网络模块的输出数据与客户端发送的更新后的神经网络模块的输出数据之间的相似度。
在第二方面的一种可能实现方式中,机器学习模型为神经网络,服务器中存储的多个模块为神经网络模块,服务器上存储有第二适配关系,第二适配关系中包括多个适配值,适配值用于表示客户端中存储的训练数据与第二神经网络之间的适配程度。方法还包括:服务器接收第一客户端发送的第一数据集合与至少一个第二神经网络之间的适配值,并更新第二适配关系。服务器获取至少一个第一神经网络包括:服务器根据第二适配关系,从多个第二神经网络中选取至少一个第一神经网络,至少一个第一神经网络包括与第一数据集合的适配值高的神经网络。具体的,服务器可以得到与第二适配关系对应的第二适配矩阵,对第二适配矩阵进行矩阵分解,以得到分解后的神经网络的相似性矩阵和用户的相似性矩阵,神经网络的相似性矩阵和用户的相似性矩阵的乘积与第二适配关系中对应位置的值需要相似。进而将神经网络的相似性矩阵和用户的相似性矩阵相乘,得到第二补全矩阵,进而根据第二补全矩阵选择与第一数据集合(也即第一客户端)的适配值高的至少一个第一神经网络。可选地,第一客户端选取的至少一个第一神经网络不仅可以包括适配值高的至少一个第一神经网络,还包括随机选取的至少一个第一神经网络。
本实现方式中,在服务器侧配置第二适配关系,由客户端生成适配值并发送给客户端,由服务器根据第二适配关系,选取与第一客户端适配的第一神经网络,既避免了对客户端计算机资源的占用,也避免了客户端的数据的泄露。
在第二方面的一种可能实现方式中,机器学习模型为神经网络,服务器中存储的多个模块为神经网络模块,方法还包括:服务器接收第一客户端发送的第一标识信息,第一标识信息为第一神经网络的标识信息,或者,第一标识信息为构建第一神经网络的神经网络模块的标识信息。服务器将至少一个第一机器学习模型发送给第一客户端,包括:服务器向第一客户端发送第一标识信息指向的第一神经网络,或者,向第一客户端发送第一标识信息指向的构建第一神经网络的神经网络模块。
在第二方面的一种可能实现方式中,机器学习模型为神经网络,服务器中存储的多个模块为神经网络模块,服务器还配置有选择器。方法还包括:服务器接收第一客户端发送的至少一个类中心,对第一数据集合执行聚类操作后,得到至少一个数据子集合,至少一个类中心中的一个类中心为至少一个数据子集合中一个数据子集合的类中心。服务器获取与第一客户端对应的至少一个第一机器学习模型,包括:服务器将类中心分别输入选择器,得到选择器输出的指示信息,并根据指示信息,确定构建至少一个第一神经网络的神经网络模块,指示信息包括多个神经网络模块中每个神经网络模块被选中的概率。服务器将至少一个第一机器学习模型发送给第一客户端,包括:服务器将构建至少一个第一神经网络的神经网络模块发送给第一客户端。
本实现方式中,通过选择器来执行神经网络模块的选择步骤,有利于提高选择过程的准确率,由服务器来执行选择步骤,有利于释放客户端的存储空间,和避免对客户端计算机资源的占用,且仅将类中心发送给服务器,也尽量避免客户端信息的泄露。
在第二方面的一种可能实现方式中,机器学习模型为神经网络,服务器中存储的多个模块为神经网络模块,一个神经网络被分为至少两个子模块,服务器存储的神经网络模块被分为与至少两个子模块对应的至少两个组,同一组中不同的神经网络模块的功能相同。服务器根据至少一个更新后的神经网络模块更新存储的神经网络模块的权重参数之后,方法还包括:服务器计算同一组包括的至少两个神经网络模块中不同的神经网络模块之间的相似度,并将相似度大于预设阈值的两个神经网络模块进行合并。具体的,服务器可以在两个不同的神经网络网络中随机选取一个神经网络模块;或者,若第二神经网络模块和第一神经网络模块具体表现为相同的神经网络,区别仅在于权重参数不同,则服务器还可以将第二神经网络模块和第一神经网络模块的权重参数求平均,以生成合并后的神经网络模块的权重参数。
本实现方式中,将相似度大于预设阈值的两个神经网络模块进行合并,也即将冗余的两个神经网络模块进行合并,不仅降低服务器对多个神经网络模块的管理难度;且避免客户端对相似度大于预设阈值的两个神经网络模块进行重复训练,以减少对客户端计算机资源的浪费。
在第二方面的一种可能实现方式中,不同的神经网络模块包括第二神经网络模块和第一神经网络模块,第二神经网络模块和第一神经网络模块之间的相似度通过以下任一种方式确定:服务器将相同数据分别输入至第二神经网络模块和第一神经网络模块,并对比第二神经网络模块的输出数据与第一神经网络模块的输出数据之间的相似度;或者,计算第二神经网络模块的权重参数矩阵和第一神经网络模块的权重参数矩阵之间的相似度。两者之间相似度的计算方式包括但不限于:计算两者之间的欧氏距离、马氏距离、余弦距离或者交叉熵。
本实现方式中,提供了计算两个不同的神经网络模块之间相似度的两种具体实现方式,则用户可以结合实际情况灵活选择,提高了本方案的实现灵活性。
对于本申请实施例第二方面以及第二方面的各种可能实现方式中名词的具体含义,以及每种可能实现方式所带来的有益效果,均可以参考第一方面中各种可能的实现方式中的 描述,此处不再一一赘述。
第三方面,本申请实施例提供一种数据处理方法,可用于人工智能领域中。服务器获取与第二客户端存储的第二数据集合的数据特性对应的至少一个第三神经网络,向第二客户端发送至少一个第三神经网络,该至少一个第三神经网络用于供客户端生成待处理数据的预测结果。
在第三方面的一种实现方式中,服务器获取与第二客户端存储的第二数据集合的数据特性对应的至少一个第三神经网络,可以以下三项中的任一项或多项:服务器接收到至少一个第二类中心,将至少一个第二类中心分别输入选择器中,以得到用于构建至少一个第三神经网络的神经网络模块,每个第二类中心为一个第二数据子集合的类中心,至少一个第二数据子集合为对第二数据集合执行聚类操作得到的。或者,服务器根据第二客户端的标识信息和第二适配关系,从至少两个第二神经网络中选取至少一个第三神经网络,至少一个第三神经网络中包括与第二数据集合适配高的神经网络。或者,服务器从多个第二神经网络中随机选取至少一个第三神经网络。
对于本申请实施例第三方面以及第三方面的各种可能实现方式中步骤的具体实现方式、每种可能实现方式中名词的具体含义,以及每种可能实现方式所带来的有益效果,均可以参考第一方面中各种可能的实现方式中的描述,此处不再一一赘述。
第四方面,本申请实施例提供一种数据处理方法,可用于人工智能领域中。第二客户端得到与第二客户端存储的第二数据集合的数据特性对应的第二标识信息,并向服务器发送获取请求,获取请求中携带有第二标识信息,第二标识信息为第三神经网络的标识信息,或者,第二标识信息为构建第三神经网络的神经网络模块的标识信息。第二客户端接收第二标识信息指向的一个或多个第三神经网络,或者,接收第二标识信息指向的用于构建一个或多个第一神经网络的神经网络模块。
对于本申请实施例第四方面以及第四方面的各种可能实现方式中步骤的具体实现方式、每种可能实现方式中名词的具体含义,以及每种可能实现方式所带来的有益效果,均可以参考第一方面中各种可能的实现方式中的描述,此处不再一一赘述。
第五方面,本申请实施例提供一种机器学习模型的训练装置,可用于人工智能领域中。装置应用于第一客户端,多个客户端与服务器通信连接,服务器中存储有多个模块,多个模块用于构建机器学习模型,第一客户端为多个客户端中的任一客户端。机器学习模型的训练装置用于执行多轮迭代,机器学习模型的训练装置包括:获取单元、训练单元和发送单元,在多轮迭代中的一轮迭代中,获取单元,用于获取至少一个第一机器学习模型,至少一个第一机器学习模型为根据第一客户端存储的第一训练数据集合的数据特性选取出来的;训练单元,用于利用第一数据集合对至少一个第一机器学习模型执行训练操作,得到至少一个训练后的第一机器学习模型;发送单元,用于将至少一个训练后的第一机器学习模型包括的至少一个更新后的模块发送给服务器,更新后的模块用于供服务器更新存储的模块的权重参数。
本申请实施例第五方面中,机器学习模型的训练装置还可以用于实现第一方面各种可能实现方式中第一客户端执行的步骤,对于本申请实施例第五方面以及第五方面的各种可 能实现方式中某些步骤的具体实现方式,以及每种可能实现方式所带来的有益效果,均可以参考第一方面中各种可能的实现方式中的描述,此处不再一一赘述。
第六方面,本申请实施例提供一种机器学习模型的训练装置,可用于人工智能领域中。装置应用于服务器,服务器与多个客户端通信连接,服务器中存储有多个模块,多个模块用于构建机器学习模型,第一客户端为多个客户端中的任一客户端。机器学习模型的训练装置用于执行多轮迭代,机器学习模型的训练装置包括:获取单元、发送单元和更新单元,在多轮迭代中的一轮迭代中,获取单元,用于获取与第一客户端对应的至少一个第一机器学习模型,第一客户端为多个客户端中的一个客户端,至少一个第一机器学习模型与第一客户端存储的第一数据集合的数据特性对应;发送单元,用于将至少一个第一机器学习模型发送给第一客户端,至少一个第一机器学习模型指示第一客户端利用第一数据集合对至少一个第一机器学习模型执行训练操作,得到至少一个训练后的第一机器学习模型;更新单元,用于从第一客户端接收至少一个训练后的第一机器学习模型包括的至少一个更新后的神经网络模块,并根据至少一个更新后的神经网络模块更新存储的神经网络模块的权重参数。
本申请实施例第六方面中,机器学习模型的训练装置还可以用于实现第二方面各种可能实现方式中服务器执行的步骤,对于本申请实施例第六方面以及第六方面的各种可能实现方式中某些步骤的具体实现方式,以及每种可能实现方式所带来的有益效果,均可以参考第二方面中各种可能的实现方式中的描述,此处不再一一赘述。
第七方面,本申请实施例提供了一种服务器,可以包括处理器,处理器和存储器耦合,存储器存储有程序指令,当存储器存储的程序指令被处理器执行时实现上述第一方面的机器学习模型的训练的方法,或者,当存储器存储的程序指令被处理器执行时实现上述第一方面的机器学习模型的训练的方法。对于处理器执行第一方面的各个可能实现方式中第一客户端执行的步骤,或者,对于处理器执行第二方面的各个可能实现方式中服务器执行的步骤,具体均可以参阅第一方面或第二方面,此处不再赘述。
第八方面,本申请实施例提供了一种终端设备,可以包括处理器,处理器和存储器耦合,存储器存储有程序指令,当存储器存储的程序指令被处理器执行时实现上述第一方面的机器学习模型的训练的方法。对于处理器执行第一方面的各个可能实现方式中第一客户端执行的步骤,具体均可以参阅第一方面,此处不再赘述。
第九方面,本申请实施例提供了一种计算机可读存储介质,计算机可读存储介质中存储有计算机程序,当该计算机程序在计算机上运行时,使得计算机执行上述第一方面的机器学习模型的训练的方法,或者,使得计算机执行上述第二方面的机器学习模型的训练方法。
第十方面,本申请实施例提供了一种电路系统,电路系统包括处理电路,处理电路配置为执行上述第一方面的机器学习模型的训练的方法,或者,处理电路配置为执行上述第二方面的机器学习模型的训练方法。
第十一方面,本申请实施例提供了一种计算机程序,当该计算机程序在计算机上运行时,使得计算机执行上述第一方面的机器学习模型的训练的方法,或者,使得计算机执行 上述第二方面的机器学习模型的训练方法。
第十二方面,本申请实施例提供了一种芯片系统,该芯片系统包括处理器,用于支持训练设备或执行设备实现上述方面中所涉及的功能,例如,发送或处理上述方法中所涉及的数据和/或信息。在一种可能的设计中,芯片系统还包括存储器,存储器,用于保存服务器或通信设备必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包括芯片和其他分立器件。
图1为本申请实施例提供的人工智能主体框架的一种结构示意图;
图2为本申请实施例提供的机器模型的训练系统的一种系统架构图;
图3为本申请实施例提供的机器学习模型的训练方法中多个不同的数据集合的一种示意图;
图4为本申请实施例提供的机器学习模型的训练方法的一种流程示意图;
图5为本申请实施例提供的机器学习模型的训练方法中多个神经网络模块的一种示意图;
图6为本申请实施例提供的机器学习模型的训练方法中多个神经网络模块的另一种示意图;
图7为本申请实施例提供的机器学习模型的训练方法中第二神经网络的三种结构示意图;
图8为本申请实施例提供的机器学习模型的训练方法中第二神经网络的另一种结构示意图;
图9为本申请实施例提供的机器学习模型的训练方法的另一种流程示意图;
图10为本申请实施例提供的机器学习模型的训练方法的又一种流程示意图;
图11为本申请实施例提供的机器学习模型的训练方法的再一种流程示意图;
图12为本申请实施例提供的机器学习模型的训练方法的又一种流程示意图;
图13为本申请实施例提供的数据处理方法的一种流程示意图;
图14为本申请实施例提供的机器学习模型的训练装置的一种结构示意图;
图15为本申请实施例提供的机器学习模型的训练装置的另一种结构示意图;
图16为本申请实施例提供的机器学习模型的训练装置的又一种结构示意图;
图17为本申请实施例提供的机器学习模型的训练装置的再一种结构示意图;
图18为本申请实施例提供的训练设备的一种结构示意图;
图19为本申请实施例提供的执行设备的一种结构示意图;
图20为本申请实施例提供的芯片的一种结构示意图。
本申请实施例提供了一种机器学习模型的训练的方法以及相关设备,对不同数据特性的训练数据分配不同的神经网络,实现了神经网络与数据特性之间的个性化匹配;每个客户端均根据客户端存储的训练数据集合的数据特性分配并训练神经网络,能够利用相同数 据特性的训练数据训练相同的神经网络,有利于提高训练后神经网络的准确率。
本申请的说明书和权利要求书及上述附图中的术语“第一”、第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、系统、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。
下面结合附图,对本申请的实施例进行描述。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
首先对人工智能系统总体工作流程进行描述,请参见图1,图1示出的为人工智能主体框架的一种结构示意图,下面从“智能信息链”(水平轴)和“IT价值链”(垂直轴)两个维度对上述人工智能主题框架进行阐述。其中,“智能信息链”反映从数据的获取到处理的一列过程。举例来说,可以是智能信息感知、智能信息表示与形成、智能推理、智能决策、智能执行与输出的一般过程。在这个过程中,数据经历了“数据—信息—知识—智慧”的凝练过程。“IT价值链”从人智能的底层基础设施、信息(提供和处理技术实现)到系统的产业生态过程,反映人工智能为信息技术产业带来的价值。
(1)基础设施
基础设施为人工智能系统提供计算能力支持,实现与外部世界的沟通,并通过基础平台实现支撑。通过传感器与外部沟通;计算能力由智能芯片提供,作为示例,该智能芯片包括中央处理器(central processing unit,CPU)、神经网络处理器(neural-network processing unit,NPU)、图形处理器(graphics processing unit,GPU)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程逻辑门阵列(field programmable gate array,FPGA)等硬件加速芯片;基础平台包括分布式计算框架及网络等相关的平台保障和支持,可以包括云存储和计算、互联互通网络等。举例来说,传感器和外部沟通获取数据,这些数据提供给基础平台提供的分布式计算系统中的智能芯片进行计算。
(2)数据
基础设施的上一层的数据指示人工智能领域的数据来源。数据涉及到图形、图像、语音、文本,还涉及到传统设备的物联网数据,包括已有系统的业务数据以及力、位移、液位、温度、湿度等感知数据。
(3)数据处理
数据处理通常包括数据训练,机器学习,深度学习,搜索,推理,决策等方式。
其中,机器学习和深度学习可以对数据进行符号化和形式化的智能信息建模、抽取、预处理、训练等。
推理是指在计算机或智能系统中,模拟人类的智能推理方式,依据推理控制策略,利用形式化的信息进行机器思维和求解问题的过程,典型的功能是搜索与匹配。
决策是指智能信息经过推理后进行决策的过程,通常提供分类、排序、预测等功能。
(4)通用能力
对数据经过上面提到的数据处理后,进一步基于数据处理的结果可以形成一些通用的能力,比如可以是算法或者一个通用系统,例如,图像的分类、图像的个性化管理、电池充电个性化管理、文本分析、计算机视觉的处理、语音识别等等。
(5)智能产品及行业应用
智能产品及行业应用指人工智能系统在各领域的产品和应用,是对人工智能整体解决方案的封装,将智能信息决策产品化、实现落地应用,其应用领域主要包括:智能终端、智能制造、智能交通、智能家居、智能医疗、智能安防、自动驾驶、智慧城市等。
本申请实施例主要用于对各种应用场景中采用到的机器学习模型进行训练,训练后的机器学习模型可以应用上述各种应用领域中以实现分类、回归或其他功能,训练后的机器学习模型的处理对象可以为图像样本、离散数据样本、文本样本或语音样本等,此处不做穷举。其中,机器学习模型具体可以表现为神经网络、线性模型或其他类型的机器学习模型等,对应的,组成机器学习模型的多个模块具体可以表现为神经网络模块、线性模型模块或组成其他类型的机器学习模型的模块等,此处不做穷举。在后续实施例中,仅以机器学习模型表现为神经网络为例进行说明,对于机器学习模型表现为除神经网络之外的其他类型时可类推理解,本申请实施例中不再赘述。
进一步地,本申请实施例可以应用于联邦学习和分布式训练两种训练方式中。为了便于理解,请先参阅图2,图2为本申请实施例提供的机器模型的训练系统的一种系统架构图,先结合图2介绍联邦学习和分布式训练这两种训练方式。机器模型的训练系统中包括服务器100和多个客户端200,服务器100和多个客户端200通信连接。其中,服务器100具体可以表现为一个服务器,也可以表现为由多个服务器组成的服务器集群,虽然图2中仅示出了一个服务器100和三个客户端200,但实际情况中服务器100和客户端200的数量可以结合实际需求确定。客户端200可以配置于终端设备上,也可以配置于服务器上,此处不做限定。
具体的,在训练阶段,在采用联邦学习这一训练方式的情况下,服务器100上存储有多个神经网络模块,多个神经网络模块用于构建至少两个第二神经网络;每个客户端200上存储有数据集合,客户端200上存储的数据集合可以用来对神经网络执行训练操作。多个客户端200中的第一客户端中存储有第一数据集合,第一客户端为多个客户端200中的任一个客户端,在一轮迭代过程中,第一客户端获取与第一数据集合的数据特性适配的至少一个第一神经网络,并利用第一数据集合对该至少一个第一神经网络执行训练操作,得到至少一个训练后的第一神经网络,再将至少一个训练后的第一机器学习模型包括的至少一个更新后的神经网络模块发送给服务器100。多个客户端200中每个客户端200均可以执行前述操作,则服务器100可以接收到多个客户端200发送的多个更新后的神经网络模块,服务器100根据接收到的多个更新后的神经网络模块更新存储的神经网络模块的权重参数,以完成一轮迭代的过程。通过多轮迭代过程以实现对服务器100存储的多个神经网络模块的权重参数的更新。
分布式训练与联邦学习的区别在于,用于对神经网络进行训练的数据是由服务器100发送给各个客户端200的。在分布式训练这种训练方式中,服务器100存储有用于构建至 少两个第二神经网络的多个神经网络模块之外,还存储有数据集合;服务器100先对存储的数据集合进行聚类操作,得到聚类后的多个数据子集合,进而服务器100分别给每个客户端200发送与一个类或几个类对应的数据子集合,也即不同客户端200中可以存储有不同数据特性的数据子集合。进一步地,针对聚类的过程,服务器100可以对存储的整个数据集合进行聚类,也可以先根据数据集合中每个数据的正确标签将整个数据集合分成不同的数据子集合,进而依次对分类后的各个数据子集合执行聚类操作,以得到多个聚类后的数据子集合。针对数据分发的步骤,服务器100可以将聚类后的数据子集合直接发送给客户端200,也可以从至少一个聚类后的数据子集合中抽样出一些数据发送给客户端200等,此处不做限定。在服务器100将用于执行训练操作的数据部署于各个客户端200之后,也需要通过多轮迭代以实现对服务器100存储的多个神经网络模块的权重参数的更新,每轮迭代过程的实现方法与联邦学习中每轮迭代过程的实现方法相同,此处不做赘述。
在推理阶段,客户端200获取与客户端200存储的数据集合的数据特性对应的第三神经网络,进而利用获取到的神经网络生成输入数据的预测结果。为了更直观地理解“数据集合的数据特性”这一概念,请参阅图3,图3为本申请实施例提供的机器学习模型的训练方法中多个不同的数据集合的一种示意图。图3中以神经网络的任务为图像分类为例,示出了4个客户端中存储的数据集合,第一个客户端存储的数据集合为狗的图像集合,则第一客户端需要神经网络执行的任务为对狗进行分类;第二个客户端存储的数据集合为狼的图像集合,则第一客户端需要神经网络执行的任务为对狼进行分类;第三个客户端存储的数据集合中包括的狗的图像集合和沙发的图像集合,则第三客户端需要两个不同的神经网络,一个神经网络执行的任务为对狗进行分类,另一个神经网络执行的任务为对沙发进行分类。其中,狗的图像集合、狼的图像集合和沙发的图像集合的数据特性各不相同,第一客户端存储的狗的图像集合和第三客户端中均存储的狗的图像集合的数据特性相同。
以下对本申请实施例提供的机器学习模型的训练方法进行详细介绍,由于该方法对训练阶段和推理阶段均有影响,而训练阶段和推理阶段的实现流程有所不同,下面分别对前述两个阶段的具体实现流程进行描述。
一、训练阶段
本申请实施例中,服务器中存储有多个神经网络模块,而在一次迭代中,针对多个客户端中的一个第一客户端,需要获取与第一数据集合适配的至少一个第一神经网络,前述至少一个第一神经网络的选取操作可以由服务器执行,也可以由客户端执行,前述两种方式的实现流程不同。进一步地,服务器或客户端上可以根据神经网络与数据集合之间的适配关系来执行前述选取操作,也即服务器或客户端先利用多个神经网络模块构建多个第二神经网络,再从多个第二神经网络中选取与一个客户端中存储的数据集合的数据特性适配的至少一个第一神经网络。或者,服务器或客户端也可以利用选择器(一种神经网络)来执行前述选取操作,也即服务器或客户端先利用选择器从多个神经网络模块中选取与一个客户端中存储的数据集合的数据特性适配的至少一个神经网络模块,再利用选取出的神经网络模块构建至少一个第一神经网络。前述两种方式的实现流程也有所不同,以下分别进行描述。
(1)由客户端根据第一适配关系选择与客户端存储的数据集合的数据特性适配的第一神经网络
具体的,请参阅图4,图4为本申请实施例提供的机器学习模型的训练方法的一种流程示意图,方法可以包括:
401、第一客户端接收服务器发送的多个神经网络模块,并根据多个神经网络模块构建至少两个第二神经网络。
本申请的一些实施例中,服务器会将存储的多个神经网络模块发送给第一客户端,对应的,第一客户端接收到多个神经网络模块,并根据多个神经网络模块构建至少两个第二神经网络。前述多个神经网络模块可以为预先训练过的神经网络模块,也可以为完全没训练过的神经网络模块。
其中,至少两个第二神经网络中的每个第二神经网络均可以被分为至少两个子模块,该多个神经网络模块包括与该至少两个子模块对应的至少两个组。不同组中可以包括相同数量的神经网络模块,也可以包括不同数量的神经网络模块;同一组中不同的神经网络模块的功能相同,作为示例,例如同一组的神经网络模块的功能均为特征提取,或者,同一组的神经网络模块的功能均为特征变换,或者,同一组的神经网络模块的功能均为分类等等,此处不做穷举。可选地,在多轮迭代训练的过程中,服务器可以将新的神经网络模块添加至该多个神经网络模块中,也可以对该多个神经网络模块执行删减操作等。
不同的神经网络模块可以表现为不同的神经网络,作为示例,例如第一组有3个神经网络模块,第一个神经网络模块采用3层的多层感知机(multilayer perceptron,MLP),第二个神经网络模块采用2层的MLP,第三个神经网络模块采用2层的卷积神经网络(convolutional neural networks,CNN)等。或者,不同的神经网络模块也可以为相同的神经网络,但权重参数不同。作为示例,例如第一组有3个神经网络模块,第一个神经网络模块采用2层的多层感知机(multilayer perceptron,MLP),第二个神经网络模块采用2层的MLP,第三个神经网络模块采用2层的卷积神经网络(convolutional neural networks,CNN),但第一个神经网络模块和第二个神经网络模块的权重参数不同,应理解,此处举例仅为方便理解本方案,不用于限定本方案
为更直观的理解本方案,请参阅图5和图6,图5和图6为本申请实施例提供的机器学习模型的训练方法中多个神经网络模块的两种示意图。图5和图6中以不同组中包括不同数量的神经网络模块,且多个神经网络模块组成的层状结构共有4层(也即对应将多个神经网络模块分为4个组),分别为SGL1、SGL2、SGL3、SGL4为例。第一层(也即第一组神经网络模块)中包括3个神经网络模块,第一个神经网络模块SGL1M1采用3层的MLP,第二个神经网络模块SGL1M2采用2层的MLP,第三个神经网络模块SGL1M3为2层的CNN。第二层(也即第二组神经网络模块)中包括4个神经网络模块,第一个神经网络模块SGL2M1采用3层的MLP,第二个神经网络模块SGL2M2采用2层的MLP,第三个神经网络模块SGL2M3为3层的CNN,第四个神经网络模块SGL2M4为2层的CNN。第三层(也即第三组神经网络模块)中包括2个神经网络模块,第一个神经网络模块SGL3M1采用2层的MLP,第二个神经网络模块SGL3M2采用2层的CNN。第四层(也即第四组 神经网络模块)中包括4个神经网络模块,第一个神经网络模块SGL4M1采用3层的MLP,第二个神经网络模块SGL4M2采用2层的MLP,第三个神经网络模块SGL4M3为1层的CNN+2层的MLP,第四个神经网络模块SGL4M4为1层的CNN+1层的MLP。
请继续参阅图6,多组神经网络模块层状结构主模型为二叉树树状结构。第一层(也即第一组神经网络模块)中包括的神经网络模块为SGL1M1;第二层(也即第二组神经网络模块)包括的神经网络模块中,从左到右依次为SGL2M1、SGL2M2;第三层(也即第三组神经网络模块)包括的神经网络模块中,从左到右依次为SGL3M1、SGL3M2、SGL3M3、SGL3M4;第四层(也即第四组神经网络模块)包括的神经网络模块中,从左到右依次为SGL4M1、SGL4M2、SGL4M3、SGL4M4、SGL4M5、SGL4M6、SGL4M7、SGL4M8。在利用图6中示出的多个神经网络模块构建多个第二神经网络时,可以从树状结构的多个神经网络模块中进行选择,作为示例,例如第二神经网络可以为SGL1M1+SGL2M1+SGL3M1+SGL4M1,作为另一示例,例如第二神经网络可以为SGL1M1+SGL2M2+SGL3M3+SGL4M5等,此处不做穷举。应理解,图5和图6中的示例均仅为方便理解本方案,不用于限定本方案。
针对多个神经网络模块构建至少两个第二神经网络的过程。第一客户端在接收到服务器发送的多个神经网络模块之后,可以从每组中只选择一个神经网络模块来构建第二神经网络,也即第二神经网络为单支路的。第一客户端也可以从每组中选择至少两个神经网络模块来构建第二神经网络,也即一个第二神经网络中包括多个支路。第一客户端还可以从某一组神经网络模块中不选择任何神经网络模块。
为更直观的理解本方案,结合图5进行举例,请参阅图7和图8,图7和图8为本申请实施例提供的机器学习模型的训练方法中第二神经网络的四种结构示意图。图7和图8中均以在图5示出的多组神经网络模块的基础上,构建第二神经网络。请先参阅图7,图7中以第一层的选取的神经网络模块为SGL1M1,第二层选取的神经网络模块为SGL2M1,第三层选取的神经网络模块为SGL3M1和SGL3M2,第4层选取的神经网络模块为SGL4M1。在一种实现方式中,如图7的(a)子示意图所示,SGL2M1的输出分为作为SGL3M1和SGL3M2的输入,SGL3M1和SGL3M2的输出加权平均作为SGL4M1的输入。在另一种实现方式中,如图7的(b)子示意图所示,在第二神经网络的第二层与第三层之间增设转化层TL,SGL2M1的输出分为作为SGL3M1和SGL3M2的输入,SGL3M1和SGL3M2的输出作为转化层TL的输入,转化层TL的输出作为SGL4M1的输入。为更直观地理解图7的(b)子示意图中示出的第二神经网络,以下公开了第二神经网络的计算过程:
h1=SGL1M1(x);
h2=SGL2M1(h1);
h3
1=SGL3M1(h2);
h3
2=SGL3M2(h2);
h
TL=TL(h3
1,h3
2);
y=SGL4M1(h
TL);
其中,x代表输入数据,h1代表SGL1M1的输出,将h1输入SGL2M1,得到SGL2M1的输 出h2,将SGL2M1的输出分别输入SGL3M1和SGL3M2,分别得到h3
1和h3
2,将h3
1和h3
2输入至转化层TL,得到转化层TL输出的h
TL,将h
TL输入SGL4M1,得到整个第二神经网络输出的x的预测结果y。
在另一种实现方式中,SGL2M1的输出分为作为SGL3M1和SGL3M2的输入,SGL3M1和SGL3M2的输出作为转化层TL的输入通过SGL2M作为选择信号,TL的输出作为SGL4M1的输入。为更直观地理解图7的(c)子示意图中示出的第二神经网络,以下公开了第二神经网络的计算过程:
h1=SGL1M1(x);
h2=SGL2M1(h1);
h3
1=SGL3M1(h2);
h3
2=SGL3M2(h2);
h
TL=h3
1*TL(h2)+h3
2*(1-TL(h2));
y=SGL4M1(h
TL);
其中,x、h1、h2、h3
1和h3
2的含义与上一实现方式中的含义类似,可参阅理解;区别在于h
TL的生成方式不同,本实现方式中h
TL的生成方式可参阅上述公式,y代表通过本实现方式中的第二神经网络输出的x的预测结果,应理解,图7中的示例均仅为方便理解本方案,不用于限定本方案。
请继续参阅图8,图8示出的第二神经网络中,第一层选取的神经网络模块为SGL1M1,第二层选取的神经网络模块为SGL2M1,第三层空缺,第四层选取的神经网络模块为SGL4M1,参阅图8的(a)子示意图,由于SGL2M1的输出维度和SGL4M1的输入维度不相等,因此不能直接相连。则可以在第二层和第三层之间增设中间转化层,将SGL2M1的输出作为中间转化层的输入,中间转化层的输出作为SGL4M1的输入,应理解,图8中的示例均仅为方便理解本方案,不用于限定本方案。
402、第一客户端从至少两个第二神经网络中选取至少一个第一神经网络。
本申请的一些实施例中,第一客户端在得到多个神经网络模块之后,在一种情况下,若第一客户端为初次选取第一神经网络,则第一客户端可以为从至少两个第二神经网络中随机选取至少两个第一神经网络。随机选取的第一神经网络的数量可以预先设定好,作为示例,例如4个、5个或6个等,此处不做限定。
可选地,还可以为在第一适配关系中的适配值数量未超过第一阈值的情况下,第一客户端从至少两个第二神经网络中随机选取至少两个第一神经网络。作为示例,利于第一阈值的取值可以为百分之十、百分之十二、百分之十五等等,此处不做限定。
在另一种情况下,在第一客户端不是初次选取第一神经网络的情况下,第一客户端可以根据第一适配关系,从至少两个第二神经网络中选取与第一数据集合的适配值高的至少一个第一神经网络。本申请实施例中,在第一客户端上预先配置有第一适配关系,进而根据第一适配关系,从至少两个第二神经网络中选取与第一数据集合的适配值高的至少一个第一神经网络,以保证了选取的为与第一数据集合的数据特性适配的神经网络,保证了实现不同客户端的神经网络的个性化定制;此外,选取与第一数据集合的数据特性适配,有 利于提高训练后的神经网络的准确率。
其中,适配值高的至少一个第一神经网络可以为适配值最高的N个第一神经网络,N的取值为大于或等于1的整数,作为示例,例如N的取值可以为1、2、3、4、5、6或其他数值等等,此处不做限定。或者,适配值高的至少一个第一神经网络可以为大于第四阈值的至少一个第一神经网络,第四阈值的取值可以结合适配值的生成方式、取值范围等因素灵活确定,此处不做限定。
具体的,第一客户端可以根据第一适配关系得到第一适配矩阵,第一适配矩阵中的每个元素代表一个适配值,在第一适配关系中存在空值时,可以通过矩阵分解的方式对第一适配关系进行补全,补全后的第一适配关系中不再包括空值,从而可以根据补全后的第一适配关系选取与第一数据集合的适配值高的至少一个第一神经网络。
可选地,在第一客户端不是初次选取第一神经网络,且第一适配关系中适配值的数量大于第一阈值的情况下,第一客户端根据第一适配关系,从至少两个第二神经网络中选取与至少一个第一神经网络,该至少一个第一神经网络中包括与第一数据集合的适配值高的神经网络。
可选地,第一客户端选取的至少一个第一神经网络不仅可以包括适配值高的至少一个第一神经网络,还包括随机选取的至少一个第一神经网络。
403、第一客户端计算第一数据集合与第一神经网络之间的适配值。
本申请的一些实施例中,第一客户端上可以通过表格、矩阵、索引、数组等形式存储有第一适配关系,该第一适配关系包括多个适配值,该适配值用于表示第一数据集合与第二神经网络之间的适配程度;第一适配关系中还可以包括每个第二神经网络的标识信息,用于唯一标识每个第一网络。为更直观的理解本方案,以下以表格为例来展示第一适配关系。
表1
其中,ID为身份标识(Identity)的缩写,表1中以多个神经网络模块被分为4组,第一组神经网络模块中包括4个神经网络模块、第二组神经网络模块中包括3个神经网络模块,第三组神经网络模块中包括2个神经网络模块,第四组神经网络模块中包括4个神经网络模块,且第一客户端从每组中只选择一个神经网络模块来构建第二神经网络为例,则共可以构建96个第二神经网络,对应的,第一适配关系中有与96个第二神经网络一一对应的96个标识信息,需要说明的是,第一适配关系中不一定会包括第一数据集合与每个第二神经网络之间的适配值,第一客户端可以根据已有的适配值,通过矩阵分解等方法计算得到第一数据集合与每个第二神经网络之间的适配值,具体计算过程在后续步骤中进行说明。
第一客户端上存储由第一数据集合,第一数据集合中包括多个第一训练数据和每个第一训练数据的准确结果。第一客户端在获取到至少一个第一神经网络之后,需要计算第一 数据集合与第一神经网络之间的适配值,并将通过步骤403计算得到的适配值写入第一适配关系,也即根据通过步骤403计算得到的适配值更新第一适配关系。对于适配值的生成方式也可参阅后续步骤中的描述,此处先不做介绍。
其中,第一数据集合与一个第一神经网络之间的适配值可通过如下两种方式计算得到。
(一)、通过计算损失函数的函数值得到适配值
本实施例中,第一数据集合与一个第一神经网络之间的适配值与第一损失函数的函数值对应。其中,第一损失函数指示第一数据的预测结果与第一数据的正确结果之间的相似度。第一数据的预测结果通过一个第一神经网络得到,第一数据和第一数据的正确结果基于第一数据集合得到。第一损失函数的函数值越大,第一数据集合与一个第一神经网络之间的适配值越小;第一损失函数的函数值越小,第一数据集合与一个第一神经网络之间的适配值越大。本申请实施例中,通过损失函数来计算第一数据集合与一个第一神经网络之间的适配值,方案简单,易于实现,且准确度高。
具体的,在一种实现方式中,第一客户端对第一数据集合进行聚类,得到至少一个数据子集合,第一数据子集合为第一数据集合的子集,第一数据子集合为至少一个数据子集合中的任一个。进而第一客户端根据第一数据子集合和第一损失函数,生成第一数据子集合与一个第一神经网络之间的适配值。其中,第一损失函数指示第一数据的预测结果与第一数据的正确结果之间的相似度;第一数据的预测结果通过一个第一神经网络得到,第一数据和第一数据的正确结果基于第一数据子集合得到。第一损失函数的函数值越小,第一数据子集合与一个第一神经网络之间的适配值越大。第一客户端对至少两个数据子集合中每个数据子集合均执行前述操作,以得到每个数据子集合与一个第一神经网络之间的适配值。第一客户端可以将多个数据子集合与该一个第一神经网络之间的适配值求平均值,以得到整个第一数据集合与该一个第一神经网络之间的适配值,并更新第一适配关系。
针对第一数据子集合与一个第一神经网络之间的适配值的生成过程。更具体的,在一种情况下,第一数据指的是第一数据子集合中的任一个数据,由于第一数据子集合中的数据会用来对第一神经网络执行训练操作(也即第一数据子集合中会包括第一训练数据),还可能用来测试训练后的第一神经网络的正确率(也即第一数据子集合中可能包括测试数据),还可能用来验证第一神经网络中超参数的正确性(也即第一数据子集合中还可能包括验证数据),则第一数据可以是用来训练的数据,也可以是用来测试的数据,还可以是用来验证的数据。第一客户端可以将第一数据子集合中的每个第一数据输入至该一个第一神经网络,得到该一个第一神经网络输出的第一数据的预测结果,并根据第一数据的预测结果和第一数据的正确结果计算得到第一损失函数的函数值,第一客户端对第一数据子集合中的每个第一数据均执行前述操作,得到多个损失函数的函数值,对前述多个损失函数的函数值求平均值,以得到整个第一数据子集合与该一个第一神经网络之间的适配值。进一步地,第一客户端可以将多个损失函数的函数值的平均值的倒数,确定为整个第一数据子集合与该一个第一神经网络之间的适配值。
在另一种情况下,第一数据指的是第一数据子集合的类中心,第一客户端还可以根据第一数据子集合,计算第一数据子集合中所有数据的类中心,将该类中心输入该一个第一 神经网络,得到该一个第一神经网络输出的第一数据的预测结果;第一客户端对第一数据子集合中所有的数据的正确结果进行求平均值,得到一个第一数据的正确结果,进而计算第一损失函数的函数值,以得到整个第一数据子集合与该一个第一神经网络之间的适配值。进一步地,第一客户端可以前述一个损失函数的函数值取倒数,并将该倒数确定为整个第一数据子集合与该一个第一神经网络之间的适配值。
可选地,结合步骤404中的描述,在利用第一数据集合对第一神经网络执行训练操作的过程中,可以将第一数据子集合与一个第一神经网络之间的适配值被确定为第一数据子集合中每个训练数据与一个第一神经网络之间的适配值,不同数据子集合中的训练数据与该一个第一神经网络之间的适配值不同。本申请实施例中,对第一数据集合进行聚类,得到至少两个数据子集合,同一数据子集合中不同训练数据与一个第一神经网络之间的适配值相同,也即同一类的训练数据对一个第一神经网络的修改能力相同,以满足同一客户端中存在至少两个不同数据特性的数据子集合的情况,以进一步提高神经网络的个性化定制能力,有利于提高训练后的神经网络的准确率。
在另一种实现方式中,第一数据指的是第一数据集合中的任一个数据,由于第一数据集合中的数据会用来对第一神经网络执行训练操作(也即第一数据集合中会包括第一训练数据),还可能用来测试训练后的第一神经网络的正确率(也即第一数据集合中可能包括测试数据),还可能用来验证第一神经网络中超参数的正确性(也即第一数据集合中还可能包括验证数据),则第一数据可以是用来训练的数据,也可以是用来测试的数据,还可以是用来验证的数据。第一客户端可以将第一数据集合中的每个第一数据逐次输入到该一个第一神经网络,并得到与每个第一数据对应的损失函数的函数值,将多个损失函数的函数值进行求平均值,以得到与整个第一数据集合对应的一个损失函数的函数值,进而根据前述与整个第一数据集合对应的一个损失函数的函数值,生成整个第一数据集合与该一个第一神经网络之间的适配值。
可选地,结合步骤404中的描述,在利用第一数据集合对第一神经网络执行训练操作的过程中,将第一数据集合与一个第一神经网络之间的适配值被确定为第一数据集合中每个第一数据与该一个第一神经网络之间的适配值,也即第一数据集合中所有第一训练数据与该一个第一神经网络之间的适配值均相同。
在另一种实现方式中,第一客户端可以将第一数据集合中的每个第一数据逐次输入到该一个第一神经网络,并得到与每个第一数据对应的损失函数的函数值,并生成每个第一数据与该一个第一神经网络之间的适配值,进而对所有第一数据与该一个第一神经网络之间的适配值求平均值,以得到整个第一数据集合与该一个第一神经网络之间的适配值。可选地,结合步骤404中的描述,在利用第一数据集合对第一神经网络执行训练操作的过程中,每个第一数据均有一个与该一个第一神经网络之间的适配值。
(二)、通过计算第一神经网络与第三神经网络之间的相似度得到适配值
本实施例中,第一数据集合与一个第一神经网络之间的适配值与第一相似度对应。其中,第一相似度越大,第一数据集合与一个第一神经网络之间的适配值越大;第一相似度越小,第一数据集合与一个第一神经网络之间的适配值越小。第一相似度指的是一个第一 神经网络和第三神经网络之间的相似度。第三神经网络为上一轮迭代中输出预测结果的准确率最高的神经网络;或者,若此轮迭代不是第一轮迭代过程,第三神经网络还可以为与第一神经网络网络结构相同的神经网络,也即第一神经网络与第三神经网络对应于相同的标识信息。第三神经网络与第一神经网络的区别在于,第三神经网络是第一客户端上一次利用第一数据集合对第三神经网络执行训练操作得到的训练后的神经网络。
本申请实施例中,由于第三神经网络为上一轮迭代中输出预测结果的准确率最高的神经网络,且第三神经网络为已经利用第一数据集合训练过的神经网络,也即第三神经网络与第一数据集合的适配程度较高,若一个第一神经网络与该第三神经网络的相似度高,则证明该一个第一神经网络与第一数据集合之间的适配程度高,则适配值就会大;提供了适配值计算的另一种实现方案,提高了本方案的实现灵活性。
具体的,一个第一神经网络和第三神经网络之间的相似度通过以下任一种方式确定:
在一种实现方式中,第一客户端将相同数据分别输入至一个第一神经网络和第三神经网络,并对比一个第一神经网络的输出数据与第三神经网络的输出数据之间的相似度。其中,该相似度可以通过计算两者之间的欧式距离、马氏距离、余弦距离、交叉熵或其他方式获得。
进一步地,一个第一神经网络的输出数据与第三神经网络的输出数据之间的相似度指的可以为整个第一神经网络的输出数据与整个第三神经网络的输出数据之间的第一相似度,则可以将该第一相似度,直接确定为第一神经网络与第三神经网络之间的相似度;或者,将该第一相似度进行转换后,得到第一神经网络与第三神经网络之间的相似度。
一个第一神经网络的输出数据与第三神经网络的输出数据之间的相似度指的还可以为第一神经网络中各个模块的输出数据与第三神经网络中各个模块的输出数据之间的相似度,计算各个模块的输出数据之间相似度的乘积,得到整个第一神经网络的输出数据与整个第三神经网络的输出数据之间的相似度,进而可以得到第一神经网络与第三神经网络之间的相似度。
在另一种实现方式中,若构建该一个第一神经网络与第三神经网络的神经网络模块为相同的神经网络,则第一客户端还可以通过计算一个第一神经网络的权重参数矩阵和第三神经网络的权重参数矩阵之间的第二相似度,进而可以将第二相似度确定为该一个第一神经网络与第三神经网络之间的相似度;或者,将该第二相似度进行转换后,得到该一个第一神经网络与第三神经网络之间的相似度。其中,该第二相似度可以通过计算两者之间的欧式距离、马氏距离、余弦距离、交叉熵或其他方式获得。
本申请实施例,提供了一个第一神经网络和第三神经网络之间的相似度的两种计算方式,提高了本方案的实现灵活性。
需要说明的是,若第三神经网络与第一神经网络对应相同的标识信息,则还可以给第三神经网络增加不置信度,第三神经网络与第一神经网络的间隔时间越久,不置信度越大。可以将不置信度与计算得到的适配值共同决定最终的适配值,可以采用相加或者相乘的方式。
404、第一客户端利用第一数据集合对第一神经网络执行训练操作,得到训练后的第一 神经网络。
本申请实施例中,第一客户端在得到至少一个第一神经网络之后,会利用第一数据集合对第一神经网络执行训练操作,以得到训练后的第一神经网络。具体的,第一数据集合中包括多个第一训练数据以及每个第一训练数据的正确结果,第一客户端将一个第一训练数据输入一个第一神经网络中,得到该一个第一神经网络输出的第一训练数据的预测结果。进而根据第一训练数据的预测结果和第一训练数据的正确结果,生成第四损失函数的函数值,根据第四损失函数的函数值进行梯度求导,以反向更新该一个第一神经网络的权重参数,以完成对该一个第一神经网络的一次训练操作,第一客户端对该一个第一神经网络进行迭代训练,直至满足预设条件,得到训练后的一个第一神经网络。
其中,第四损失函数指示第一训练数据的预测结果和第一训练数据的正确结果,第四损失函数的类型与第一神经网络的任务类型相关,作为示例,例如第一神经网络的任务为分类,则第四损失函数可以为交叉熵损失函数、0-1损失函数或其他损失函数等,此处不做限定。第一客户端对该一个第一神经网络进行迭代训练的目标为拉近第一训练数据的预测结果和第一训练数据的正确结果之间的相似度;预设条件可以为满足第四损失函数的收敛条件,也可以为迭代次数达到预设次数。
为更直观地理解本方案,如下公开了第四损失函数的一个示例:
其中,LossM
1代表第四损失函数,第一客户端中的第一数据集合d
ij={x
ij,y
ij},j的取值为1到J
i,M
k代表一个第二损失函数,应理解,式(1)中的示例仅为方便理解本方案,不用于限定本方案。
进一步地,第一客户端在得到至少一个第一神经网络之后,利用第一数据集合对第一神经网络执行训练操作之前,还需要初始化第一神经网络的参数。在一种方式中,第一客户端可以直接使用服务器发送给第一客户端时第一神经网络的参数;在另一种方式中,还可以为利用第一客户端上一次训练第一神经网络时得到的权重参数初始化本次第一神经网络的权重参数;在另一种实现方式中,还可以根据服务器发送给第一客户端时第一神经网络的参数,和,第一客户端上一次训练第一神经网络时得到的权重参数进行加权平均,以初始化本次第一神经网络的权重参数;在另一种实现方式中,还可以随机初始化第二神经外网络的参数等,此处不做限定。
可选地,步骤404可以包括:第一客户端根据第二损失函数,利用第一数据集合对第一神经网络执行训练操作。其中,第二损失函数包括第一项和第二项,第一项指示第一预测结果与第一训练数据的正确结果之间的相似度,第二项指示第一预测结果与第二预测结果之间的相似度,第二项可以称为惩罚项或约束项。进一步地,第一预测结果为将第一训练数据输入第一神经网络后,由第一神经网络输出的第一训练数据的预测结果;第二预测结果为将第一训练数据输入第四神经网络后,由第四神经网络输出的第一训练数据的预测结果。第四神经网络为未执行过训练操作的第一神经网络,也即第四损失函数和第二损失函数的初始状态一致,但在对第二损失函数进行训练的过程中,第四损失函数的权重参数 一直不会更新。本申请实施例中,由于第一客户端上的第一数据集合不一定与第一神经网络匹配,在利用第一数据集合对第一神经网络进行训练的过程中,第二损失函数还会指示第一预测结果与第二预测结果之间的相似度,也即避免第一神经网络在训练过程中被过多改动。
也即第二损失函数在第四损失函数的基础上加入了惩罚项,加入惩罚项后的目的为拉近第一神经网络输出的第一训练数据的预测结果与第四神经网络输出的第一训练数据的预测结果之间的相似度。为更直观地理解本方案,如下公开了第二损失函数的一个示例:
其中,LossM
2代表第二损失函数,γ
1是一个超参数,y′
ij代表将第一训练数据输入第四损失函数之后,由第四损失函数输出的第一训练数据的预测结果,
代表的含义以及式(2)中其他字母的含义均可参阅上述对式(1)的描述,此处不再赘述。应理解,式(2)中的举例仅为方便理解本方案,不用于限定本方案。
可选地,步骤404还可以包括:第一客户端根据第五损失函数,利用第一数据集合对第一神经网络执行训练操作。其中,第五损失函数指示第一预测结果与第一训练数据的正确结果之间的相似度,还指示第一神经网络与第四神经网络之间的相似度,也即第二损失函数在第四损失函数的基础上加入了惩罚项,加入惩罚项后的目的为拉近第一神经网络与第四神经网络之间的相似度。为更直观地理解本方案,如下公开了第五损失函数的一个示例:
其中,LossM
3代表第五损失函数,γ
2是一个超参数,M0代表第四损失函数,
代表的含义以及式(3)中其他字母的含义均可参阅上述对式(1)的描述,此处不再赘述。应理解,式(3)中的举例仅为方便理解本方案,不用于限定本方案。
可选地,第一训练数据与第一神经网络之间的适配值越高,在利用第一训练数据对第一神经网络进行一次训练的过程中,对第一神经网络的权重参数的修改程度越大。进一步地,在一次训练过程中,调整对第一神经网络的权重参数的修改成的方式包括:调整学习率、调整惩罚项的系数或其他方式等。本申请实施例中,由于同一客户端中不同的训练数据与一个第一神经网络之间的适配程度有高有低,所有训练数据均以固定的能力修改一个第一神经网络的权重参数这是不合理的,一个第一训练数据与第一神经网络之间的适配值越高,则证明该第一神经网络越应该处理该一个第一训练数据,在利用该第一训练数据对第一神经网络进行一次训练的过程中,对第一神经网络的权重参数的修改程度越大,有利于提高一个第一神经网络的训练效率。
其中,学习率越大,在一次训练过程中对第一神经网络的权重参数的修改程度越大,学习率越小,再一次训练过程中对第一神经网络的修改程度越小;也即第一训练数据与第一神经网络之间的适配值越高,在利用该第一训练数据对第一神经网络进行一次训练的过 程中的学习率越大。为更直观地理解本方案,结合上述式(1)至式(3)进行举例。
η
i=η*E;
其中,M
k+1代表对M
k执行一次训练操作后的第一神经网络,η
i代表学习率,η为一个超参数,E代表第一训练数据与正在训练的该一个第一神经网络之间的适应值,LossM代表LossM
1、LossM
2和LossM
3中的任意一个,应理解,上述举例仅为方便理解本方案,不用于限定本方案。
惩罚项的系数越小,在一次训练过程中对第一神经网络的修改程度越大;惩罚项的系数越大,在一次训练过程中对第一神经网络的修改程度越小;也即第一训练数据与第一神经网络之间的适配值越高,在利用该第一训练数据对第一神经网络进行一次训练的过程中的惩罚项系数越小。结合上述式(2)和上述式(3)进行举例,γ
1和γ
2的取值均可以为1/E,也即γ
1和γ
2的取值均可以为第一训练数据与正在训练的一个第一神经网络之间适配值的倒数。
进一步地,在一种实现方式中,第一数据集合中不同的第一训练数据与该一个第一神经网络之间的适配值可以不同。可以为对第一数据集合进行聚类后得到至少两个数据子集合,同一数据子集合中的训练数据与该一个第一神经网络之间的适配值相同,不同数据子集合中的训练数据与该一个第一神经网络之间的适配值不同。也可以为第一数据集合中每个第一训练数据与该一个第一神经网络之间的适配值均不同。在另一种实现方式中,也可以为将整个第一数据集合视为一个整体,第一数据集合中所有第一训练数据与该一个第一神经网络之间的适配值均相同。
需要说明的是,由于第一客户端会选取出一个或多个第一神经网络,第一客户端需要计算第一训练数据与一个或多个第一神经网络中每个第一神经网络之间的适配值,并对一个或多个第一神经网络中每个第一神经网络执行训练操作。则可以为在每次执行步骤403和404时的执行对象仅为一个或多个第一神经网络中的一个第一神经网络,则第一客户端需要重复执行步骤403和404多次。或者,第一客户端可以先通过步骤403分别计算第一训练数据与一个或多个第一神经网络中所有第一神经网络之间的适配值,之后再通过步骤404分别对每个第一神经网络执行迭代操作。
此外,在步骤404所描述的整个训练过程中,若训练后的至少一个第一神经网络中所有第一神经网络的准确率均没有达到第二阈值,第一客户端可以直接生成新的神经网络模块,并结合接收到的多个神经网络模块构建新的第一神经网络。可选地,在通过步骤404训练结束后,可以对比包括新增的神经网络模块的第一神经网络与不包括新增的神经网络模块的第一神经网络的准确率,如果准确率增益没有超过第三阈值则不保留新增的神经网络模块。
405、第一客户端将至少一个训练后的第一神经网络包括的至少一个更新后的神经网络模块发送给服务器。
本申请实施例中,第一客户端在得到至少一个训练后的第一神经网络之后,会将至少一个训练后的第一神经网络包括的至少一个更新后的神经网络模块发送给服务器,对应的,服务器会接收到第一客户端发送的至少一个更新后的神经网络模块发送给服务器,由于第 一客户端为多个客户端中的任一个客户端,则服务器会接收到多个客户端中每个客户端发送的至少一个更新后的神经网络模块。
可选地,若第一客户端还将新增的神经网络模块发送给服务器,则服务器还可以接收到新增的神经网络模块。
406、服务器更新存储的神经网络模块的权重参数。
本申请实施例中,服务器在接收到多个客户端中每个客户端发送的至少一个更新后的神经网络模块之后,服务器需要根据接收到的多个更新后的神经网络模块,更新存储的神经网络的权重参数,以完成多轮迭代中的一轮迭代。
具体的,在一种实现方式中,由于不同客户端发送的可以存在相同的神经网络模块,则将不同客户端发送的相同的神经网络模块的权重参数进行加权平均,作为服务器中该神经网络模块的权重参数。对于不同客户端中没有重合的神经网络模块,则直接将客户端发送的神经网络模块的参数作为服务器中该神经网络模块的权重参数。其中,相同的神经网络模块指的是具体的神经网络相同,且位于相同的分组中。
可选地,若服务器接收到新增的神经网络模块,则可以将新增的神经网络模块放入到对应的分组中。进一步可选地,为增加各个客户端的隐私性,若多个客户端在同一分组中均加入了新增的神经网络模块,则可以将同一分组中所有新增的神经网络模块加权平均为一个神经网络模块之后,再放入该分组中。
在另一种实现方式中,若服务器中存在训练数据,则还可以使用模型蒸馏的方法,利用多个客户端发送的多个更新后的神经网络模块,来更新服务器存储的神经网络模块的权重参数。也即使用服务器中存储的训练数据重新训练服务器中存储的多个神经网络模块,训练的目的为拉近服务器中存储的神经网络模块的输出数据与客户端发送的更新后的神经网络模块的输出数据之间的相似度。
一个神经网络被分为至少两个子模块,服务器存储的多个神经网络模块被分为与至少两个子模块对应的至少两个组,同一组中不同的神经网络模块的功能相同。则可选地,服务器在更新存储的神经网络模块的权重参数之后,还会计算同一组包括的至少两个神经网络模块中不同的神经网络模块之间的相似度,并将相似度大于预设阈值的两个神经网络模块进行合并。本申请实施例中,将相似度大于预设阈值的两个神经网络模块进行合并,也即将冗余的两个神经网络模块进行合并,不仅降低服务器对多个神经网络模块的管理难度;且避免客户端对相似度大于预设阈值的两个神经网络模块进行重复训练,以减少对客户端计算机资源的浪费。
具体的,针对相似度判断的过程。同一组中不同的神经网络模块包括第二神经网络模块和第一神经网络模块,第二神经网络模块和第一神经网络模块之间的相似度通过以下任一种方式确定:
在一种实现方式中,服务器将相同数据分别输入至第二神经网络模块和第一神经网络模块,并对比第二神经网络模块的输出数据与第一神经网络模块的输出数据之间的相似度。该相似度的计算方式包括但不限于:计算两者之间的欧氏距离、马氏距离、余弦距离或者交叉熵等等,此处不做穷举。
在另一种实现方式中,若第二神经网络模块和第一神经网络模块具体表现为相同的神经网络,区别仅在于权重参数不同,则可以计算第二神经网络模块的权重参数矩阵和第一神经网络模块的权重参数矩阵之间的相似度。计算相似度的方式与上一实现方式类似,可参照理解。结合上述图5进行举例,先针对第一层(也即第一组),服务器需要计算SGL1M1,SGL1M2和SGL1M3任意两两之间的相似度D
mn=<SGL1Mm,SGL1Mn>,如果D
mn大于预设阈值,则在SGL1Mm和SGL1Mn中随机选取一个;比如D
12大于预设阈值,则随机选择SGL1M1,并删除SGL1M2。依次类推处理第二层(也即第二组)、第三层(也即第三组)和第四层(也即第四组)中的神经网络模块。
针对两个神经网络模块的合并过程,服务器可以在两个不同的神经网络网络中随机选取一个神经网络模块;若第二神经网络模块和第一神经网络模块具体表现为相同的神经网络,区别仅在于权重参数不同,则还可以将第二神经网络模块和第一神经网络模块的权重参数求平均,以生成合并后的神经网络模块的权重参数。
本申请实施例中,提供了计算两个不同的神经网络模块之间相似度的两种具体实现方式,则用户可以结合实际情况灵活选择,提高了本方案的实现灵活性。
需要说明的是,服务器在更新存储的神经网络模块的权重参数之后,会重新进入步骤401,以重新执行步骤401至407,也即重新执行下一轮次的迭代。
为了更直观地理解本方案,请参阅图9,图9为本申请实施例提供的机器学习模型的训练方法的一种流程示意图。如图9所示,服务器中存储有多个神经网络模块,服务器会将存储的多个神经网络模块分别发给每个客户端,客户端在得到多个神经网络模块之后,针对多个客户端中的任一客户端(例如第一客户端),先根据多个神经网络模块构建多个第二神经网络,并从多个第二神经网络中选取与第一客户端存储的第一数据集合的数据特性适配的至少一个第二客户端,计算第一数据集合与每个第一神经网络之间的适配值,并利用第一数据集合对每个第一神经网络执行训练操作,得到用于构建第一神经网络的多个更新后的神经网络模块,再将多个更新后的神经网络模块发送给服务器。服务器接收到多个客户端中每个客户端发送的更新后的神经网络模块之后,根据所有客户端发送的更新后的神经网络模块更新存储的多个神经网络模块的权重参数,从而完成多轮迭代中的一轮迭代。虽然图9中仅示出了两个客户端,但实际情况中,服务器可以与更多客户端建立通信连接,图9中的示例仅为方便理解本方案,不用于限定本方案。
(2)由客户端利用选择器选择与客户端存储的数据集合的数据特性适配的第一神经网络
具体的,请参阅图10,图10为本申请实施例提供的机器学习模型的训练方法的一种流程示意图,方法可以包括:
1001、第一客户端接收服务器发送的选择器。
本申请的一些实施例中,服务器向第一客户端发送选择器,对应的,第一客户端接收服务器发送的选择器,选择器为用于从多个神经网络模块中选取与第一数据集合的数据特征匹配的至少一个神经网络模块的神经网络。服务器还可以向第一客户端发送服务器存储的多个神经网络模块中每个神经网络模块的标识信息。
1002、第一客户端根据第一数据集合,将训练数据输入选择器,得到选择器输出的指示信息,指示信息包括多个神经网络模块中每个神经网络模块被选中的概率,用于指示构建至少一个第一神经网络的神经网络模块。
本申请的一些实施例中,第一客户端根据第一数据集合,将训练数据输入选择器,得到选择器输出的指示信息;其中,指示信息包括多个神经网络模块中每个神经网络模块被选中的概率,若多个神经网络模块中共包括Z个神经网络模块,则指示信息具体可以表现为包括Z个元素的向量,每个元素指代一个神经网络模块被选中的概率。结合图5进行举例,图5中共包括18个神经网络模块,则Z的取值为18,应理解,此处举例仅为方便理解本方案,不用于限定本方案。
针对将训练数据输入选择器的过程。在一种实现方式中,第一客户端可以将第一数据集合中的每个第一训练数据(训练数据的一种示例)均分别输入选择器一次,以得到与每个第一训练数据对应的指示信息。在另一种实现方式中,第一客户端也可以对第一数据集合执行聚类操作,并分别将聚类后的几个类中心(训练数据的一种示例)输入选择器,以得到与每个类中心对应的指示信息。在另一种实现方式中,第一客户端也可以对第一数据集合执行聚类操作,并分别从聚类后的几个数据子集合中抽样几个第一训练数据,将抽样得到的第一训练数据(训练数据的一种示例)分别输入选择器,以得到与每个抽样得到的第一训练数据对应的指示信息等,第一客户端还可以通过其他方式生成指示信息,此处不做穷举。
针对根据指示信息确定构建至少一个第一神经网络的神经网络模块的过程。在一种实现方式中,第一客户端可以初始化一个用于指示每个神经网络模块被选中次数的数组,初始化值为0,该数组也可以为表格、矩阵等其他形式,此次不做穷举。第一客户端在得到至少一个指示信息之后,针对每一个指示信息,对于选中概率大于第五阈值的神经网络模块,则数组中与该神经网络模块对应的次数加一,在遍历所有指示信息之后,第一客户端根据该数组统计被选中次数大于第六阈值的至少一个神经网络模块,并将前述至少一个神经网络模块确定为用于构建至少一个第一神经网络的神经网络模块,第五阈值和第六阈值的取值均可结合实际情况设定,此处不做限定。
在另一种实现方式中,第一客户端在得到多个指示信息之后,还可以对多个指示信息求平均值,得到一个包括Z个元素的向量,向量中每个元素指示一个神经网络模块被选中的概率,进而从Z个元素中获取平均值最大的H个元素,并将前述H个元素指向的H个神经网络模块确定为用于构建至少一个第一神经网络的神经网络模块,Z为大于1的整数,H为大于或等于1的整数,对于Z和H的取值均可以结合实际情况灵活设定,此处不做限定。
1003、第一客户端向服务器发送第一标识信息,第一标识信息为构建第一神经网络的神经网络模块的标识信息。
本申请的一些实施例中,第一客户端上还可以存储有每个神经网络模块的标识信息,在第一客户端确定用于构建第一神经网络的多个神经网络模块之后,还会获取前述多个神经网络模块的标识信息,以组成第一标识信息,第一标识信息中包括构建第一神经网络的 所有神经网络模块的标识信息。
1004、服务器向第一客户端发送第一标识信息指向的构建第一神经网络的神经网络模块。
本申请的一些实施例中,服务器在接收到第一标识信息之后,从存储的所有神经网络模块(也即L个神经网络模块)中获取第一标识信息指向的所有神经网络模块,并向第一客户端发送第一标识信息指向的构建第一神经网络的神经网络模块。
1005、第一客户端将第一训练数据输入选择器,得到选择器输出的指示信息。
本申请的一些实施例中,第一客户端将一个第一训练数据输入训练器,得到选择器输出的一个指示信息,该一个指示信息具体可以表现为包括Z个元素的向量,指示Z个神经网络模块中每个神经网络模块被选中的概率,用于指示构建第一神经网络的神经网络模块。结合图5举例,该指示信息可以[M
SGL1M1,M
SGL1M2,M
SGL1M3,M
SGL2M1,...,M
SGL4M3,M
SGL4M4],其中,M
SGL1M1代表第一组中第一个神经网络模块被选中的概率,剩下的元素依次类推,此处不再赘述。
本申请实施例中,根据第一数据集合,将训练数据输入至选择器,得到选择器输出的指示信息,并根据该指示信息选取用于构建第一神经网络的神经网络模块,选择器为用于从多个神经网络模块中选取与第一数据集合的数据特征匹配的神经网络模块的神经网络,提供了选取构建第一神经网络的神经网络模块的又一种实现方式,提高了本方案的实现灵活性;且通过神经网络来选取,有利于提高神经网络模块的选取过程的准确率。
1006、第一客户端根据接收到的多个神经网络模块、指示信息和第一训练数据,得到第一神经网络输出的第一训练数据的预测结果。
本申请的一些实施例中,第一客户端在通过步骤1005得到一个指示信息之后,可以根据接收到的多个神经网络模块、指示信息和第一训练数据,得到第一神经网络输出的第一训练数据的预测结果。为更直观地理解本方案,结合图5进行举例,如下公开了计算第一训练数据的预测结果的公式的一个示例:
其中,M
sGL1Mq、M
sGL2Mq、M
sGL3Mq和M
sGL4Mq均来自选择器输出的指示信息,SGL1Mq代表第一组神经网络模块中的一个神经网络模块,SGL1Mq(x)代表将第一训练数据输入第一组神经网络模块中的一个神经网络模块之后该神经网络模块的输出,若第一客户端未从服务器获取到某个神经网络模块,则视为该神经网络模块的输出为0,h1代表整个第一组的输出数据,可对上述公式中的其他公式进行类推理解,y代表整个第一神经网络的输出,也即第一训练数据的预测结果,应理解,此处举例仅为方便理解本方案,不用于限定本方案。
1007、第一客户端根据第三损失函数,对第一神经网络和选择器执行训练操作,其中,第三损失函数指示第一训练数据的预测结果与正确结果之间的相似度,还指示该指示信息的离散程度。
本申请的一些实施例中,第一客户端在生成第一训练数据的预测结果之后,会根据第一训练数据的预测结果与第一训练数据的正确结果以及选择器生成的指示信息,生成第三损失函数的函数值,并根据第三损失函数的函数值进行梯度求导,以反向更新第一神经网络(也即更新接收到的多个神经网络模块)和选择器的权重参数,以完成对接收到的多个神经网络模块和选择器的一次训练。训练的目的为拉近第一训练数据的预测结果与正确结果之间的相似度,且增大选择器输出的指示信息的离散程度。
其中,第三损失函数包括第三项和第四项,第三项指示第一训练数据的预测结果与正确结果之间的相似度,第四项指示该指示信息的离散程度。第三项可以为基于第一训练数据的预测结果与正确结果之间的交叉熵距离、一阶距离、二阶距离等得到,第四项可以为对该指示信息进行正则化处理,例如对该指示信息进行L1正则化、进行LP正则化等等,此处均不做限定。为更直观地理解本方案,以下公开第三损失函数的一个示例:
其中,LossM
4代表第三损失函数,
的含义可参阅图4对应实施例中对式(1)的描述,此处不做赘述,MS(x)代表选择器输出的指示信息,γ
3为一个超参数,应理解,式(4)中的举例仅为方便理解本方案,不用于限定本方案。
第一客户端重复执行步骤1005至1007,直至达到预设条件,得到第一标识信息指向的多个更新后的神经网络模块和训练后的训练器。预设条件可以为迭代训练的迭代次数达到预设次数,也可以为第三损失函数满足收敛条件。
1008、第一客户端将至少一个更新后的神经网络模块和训练后的训练器发送给服务器。
1009、服务器更新存储的神经网络模块的权重参数。
本申请实施例中,服务器在可以接收到多个客户端(包括第一客户端)发送的多个更新后的神经网络模块,需要更新存储的Z个神经网络模块的权重参数。具体实现方式可参阅图3对应实施例中步骤406中的描述,此处不做赘述。
1010、服务器更新选择器的权重参数。
本申请的一些实施例中,服务器可以接收到多个客户端发送的训练后的训练器,对多个训练后的训练器中对应位置的权重参数进行求平均值,以更新服务器存储的选择器的权重参数,从而完成了多轮迭代中的一轮迭代。需要说明的是,在执行完步骤1010之后,可以重新进入步骤1001,以进入下一轮的迭代。
本申请实施例中,在训练构建第一神经网络的神经网络模块的同时,训练选择器,节约了计算机资源;用选择器处理需要处理的数据来训练选择器,有利于提高选择器输出的指示信息的准确率。
本申请实施例中,通过上述方式,能够对不同数据特性的训练数据分配不同的神经网络,也即实现了神经网络与数据特性之间的个性化匹配;此外,由于第一客户端为多个客 户端中的任一客户端,对多个客户端中的每个客户端均根据客户端存储的训练数据集合的数据特性分配并训练神经网络,能够利用相同数据特性的训练数据训练相同的神经网络,不同数据特性的训练数据训练不同的神经网络,从而不仅实现了神经网络与数据特性之间的个性化匹配,而且有利于提高训练后神经网络的准确率。
(3)由服务器根据第二适配关系选择与第一客户端存储的数据集合的数据特性适配的第一神经网络
具体的,请参阅图11,图11为本申请实施例提供的机器学习模型的训练方法的一种流程示意图,方法可以包括:
1101、服务器获取与第一客户端对应的至少一个第一神经网络。
本申请的一些实施例中,服务器中可以配置有多个神经网络模块,服务器会根据存储的多个神经网络模块构建多个第二神经网络,对于多个神经网络模块以及多个第二神经网络的描述可以参阅图4对应实施例中步骤401中的描述,此处不做赘述。
当服务器选择要为第一客户端分配至少一个第一神经网络时,需要获取与第一客户端对应的至少一个第一神经网络。具体的,与图4对应实施例中步骤402中描述的类似,服务器可以从多个第二神经网络中随机选取至少一个第一神经网络模块,也可以根据第二适配关系,从至少两个第二神经网络中选取与第一数据集合的适配值高的至少一个第一神经网络。前述两种方式的应用情况可参阅图4对应实施例中步骤402中的描述,此处不做赘述。
与图4对应的实施例中步骤402不同的地方在于,服务器中存储的为第二适配关系,第二适配关系中包括多个适配值,适配值用于表示多个客户端中存储的训练数据与第二神经网络之间的适配程度。结合图5进行举例。
表2
| 神经网络ID1 | 神经网络ID2 | …… | 神经网络ID96 | |
| 客户端ID1 | E1_1 | E1_2 | E1_96 | |
| 客户端ID2 | E2_1 | Null | Null | |
| …… | ||||
| 客户端ID100 | Null | E100_2 | E100_96 |
其中,表2中以共可以构建96个第二神经网络,共有100个客户端为例,E1_1、E1_2等代表适配值,如表2所示,第二适配关系中可以包括空值,应理解,表2中的示例仅为方便理解本方案,不用于限定本方案。
在服务器为非初次执行第一神经网络的分配操作,或者,在第二适配关系包括的适配值的数量所占比例大于第一阈值的情况下,服务器会根据第二适配关系,从至少两个第二神经网络中选取与第一数据集合(也即第一客户端)的适配值高的至少一个第一神经网络。
具体的,服务器可以得到与第二适配关系对应的第二适配矩阵,对第二适配矩阵进行矩阵分解,以得到分解后的神经网络的相似性矩阵和用户的相似性矩阵,神经网络的相似性矩阵和用户的相似性矩阵的乘积与第二适配关系中对应位置的值需要相似。进而将神经 网络的相似性矩阵和用户的相似性矩阵相乘,得到第二补全矩阵,进而根据第二补全矩阵选择与第一数据集合(也即第一客户端)的适配值高的至少一个第一神经网络。
可选地,第一客户端选取的至少一个第一神经网络不仅可以包括适配值高的至少一个第一神经网络,还包括随机选取的至少一个第一神经网络。
1102、服务器将选取的至少一个第一神经网络发送给第一客户端。
1103、第一客户端计算第一数据集合与第一神经网络之间的适配值。
1104、第一客户端利用第一数据集合对第一神经网络执行训练操作,得到训练后的第一神经网络。
1105、第一客户端将至少一个训练后的第一神经网络包括的至少一个更新后的神经网络模块发送给服务器。
本申请实施例中,步骤1103至1105的具体实现方式可参阅图4对应实施例中步骤403至405中的描述,此处不做赘述。
1106、第一客户端将第一数据集合与每个第一神经网络之间的适配值发送给服务器。
本申请的一些实施例中,第一客户端还会将通过步骤1103计算得到的每个第一神经网络与第一数据集合(也即第一客户端)之间的适配值发送给服务器。其中包括神经网络的标识信息和第一客户端的标识信息,用于指示服务器第一客户端与哪几个神经网络之间的适配值。应理解,步骤1106可以与步骤1105一起执行,也可以在步骤1104和1105任一步骤的之前或之后执行,此处不限定步骤1106的执行顺序。
1107、服务器更新第二适配关系。
本申请的一些实施例中,由于第一客户端为多个客户端中的任一个客户端,则服务器可以获取到每个客户端发送的适配值,也即服务器客户得到多组适配关系,每组适配关系为一个客户端标识和一个神经网络标识之间的适配值。则服务器可以根据接收到的多个适配值更新第二适配关系。服务器还可以从第二适配关系中删除长时间不更新适配值,长时间不更新指的是超过20轮未更新过。
1108、服务器更新存储的神经网络模块的权重参数。
本申请实施例中,步骤1108的具体实现方式可参阅图4对应实施例中步骤405中的描述,此处不做赘述。可选地,服务器还可以删除跟所有客户端适配度都很低的神经网络模块,或者,还可以删除长时间不被选中的神经网络。
在执行完步骤1108之后,还可以更新第二适配关系,以将与删除掉的神经网络模块所对应的信息从第二适配关系中删除,需要说明的是,可以先执行步骤1107,再执行步骤1108,也可以先执行步骤1108,再执行步骤1107,此处不做限定。
本申请实施例中,在服务器侧配置第二适配关系,由客户端生成适配值并发送给客户端,由服务器根据第二适配关系,选取与第一客户端适配的第一神经网络,既避免了对客户端计算机资源的占用,也避免了客户端的数据的泄露。
(4)由服务器利用选择器选择与第一客户端存储的数据集合的数据特性适配的第一神经网络
具体的,请参阅图12,图12为本申请实施例提供的机器学习模型的训练方法的一种 流程示意图,方法可以包括:
1201、第一客户端对第一数据集合执行聚类操作后,得到至少一个数据子集合,并生成与至少一个数据子集合一一对应的至少一个第一类中心。
本申请的一些实施例中,第一客户端对第一数据集合执行聚类操作后,得到至少一个数据子集合之后,会生成每个数据子集合的第一类中心,从而生成与该至少一个数据子集合一一对应的至少一个第一类中心。
1202、服务器接收第一客户端发送的至少一个第一类中心。
本申请的一些实施例中,第一客户端在生成至少一个第一类中心之后,会将该至少一个第一类中心发送给服务器,对应的,服务器接收第一客户端发送的至少一个第一类中心。
1203、服务器将至少一个第一类中心分别输入选择器,得到选择器输出的指示信息,并根据指示信息,确定构建至少一个第一神经网络的神经网络模块。
本申请的一些实施例中,服务器将至少一个第一类中心分别输入选择器,得到选择器输出的与至少一个第一类中心对应的至少一个指示信息。进而根据指示一个指示信息选取用于构建至少一个第一神经网络的神经网络模块。前述选取过程可以参阅图10对应实施例中步骤1002中的描述,此处不做赘述。
1204、服务器将选择器和构建至少一个第一神经网络的神经网络模块发送给第一客户端。
1205、第一客户端将第一训练数据输入选择器,得到选择器输出的指示信息,指示信息包括多个神经网络模块中每个神经网络模块被选中的概率,用于指示构建第一神经网络的神经网络模块。
1206、第一客户端根据接收到的多个神经网络模块、指示信息和第一训练数据,得到第一神经网络输出的第一训练数据的预测结果。
1207、第一客户端根据第三损失函数,对第一神经网络和选择器执行训练操作,其中,第三损失函数指示第一训练数据的预测结果与正确结果之间的相似度,还指示该指示信息的离散程度。
1208、第一客户端将至少一个更新后的神经网络模块和训练后的训练器发送给服务器。
1209、服务器更新存储的神经网络模块的权重参数。
1210、服务器更新选择器的权重参数。
本申请实施例中,步骤1205至1210的具体实现方式可参阅图10对应实施例中步骤1005至1010的描述,此处不做赘述。需要说明的是,服务器可以重复执行步骤1201至1208,以实现与多个客户端中每个客户端的交互,再执行步骤1209和1210,以完成多轮迭代中的一轮迭代。服务器在执行完步骤1210之后,再重新进入步骤1201,以进入下一轮迭代。
本申请实施例中,通过选择器来执行神经网络模块的选择步骤,有利于提高选择过程的准确率,由服务器来执行选择步骤,有利于释放客户端的存储空间,和避免对客户端计算机资源的占用,且仅将类中心发送给服务器,也尽量避免客户端信息的泄露。
本申请实施例中,通过上述方式,能够对不同数据特性的训练数据分配不同的神经网络,也即实现了神经网络与数据特性之间的个性化匹配;由于第一客户端为多个客户端中 的任一客户端,对多个客户端中的每个客户端均根据客户端存储的训练数据集合的数据特性分配并训练神经网络,能够利用相同数据特性的训练数据训练相同的神经网络,不同数据特性的训练数据训练不同的神经网络,从而不仅实现了神经网络与数据特性之间的个性化匹配,而且有利于提高训练后神经网络的准确率;由服务器选择与各个客户端适配的神经网络,既避免了将所有神经外网络模块发送给客户端,以减少对客户端存储资源的浪费;且避免了对客户端计算机资源的占用,有利于提高用户体验。
二、推理阶段
具体的,请参阅图13,图13为本申请实施例提供的数据处理方法的一种流程示意图,方法可以包括:
1301、服务器获取与第二客户端存储的第二数据集合的数据特性对应的至少一个第三神经网络。
本申请实施例中,第二客户端可以为与服务器连接的多个客户端中的任一客户端,也可以为与服务器新建立连接关系的客户端。
具体的,在一种情况下,由第二客户端选择与第二数据集合的数据特性对应的至少一个第三神经网络,则服务器可以接收第二客户端发送的第二标识信息,第二标识信息为第三神经网络的标识信息,或者,第二标识信息为构建第三神经网络的神经网络模块的标识信息;对应的,服务器获取第二标识信息指向的一个或多个第三神经网络,或者,获取第二标识信息指向的用于构建一个或多个第一神经网络的神经网络模块。
在另一种情况下,由服务器选择与第二数据集合的数据特性对应的至少一个第三神经网络,则在一种实现方式中,服务器在得到第二客户端的标识信息之后,根据第二适配关系,获取与第二客户端的标识信息适配的至少一个第三神经网络。
在另一种情况下,第二客户端对第二数据集合执行聚类操作,得到至少一个第二数据子集合,生成与至少一个第二数据子集合对应的至少一个第二类中心,服务器接收到至少一个第二类中心后,将至少一个第二类中心分别输入选择器中,以得到用于构建至少一个第三神经网络的神经网络模块。
在另一种情况下,服务器根据第二客户端的标识信息和第二适配关系,从至少两个第二神经网络中选取至少一个第三神经网络,至少一个第三神经网络中包括与第二数据集合适配高的神经网络。
在另一种情况下,服务器从多个第二神经网络中随机选取至少一个第三神经网络。
1302、服务器向第二客户端发送至少一个第三神经网络。
本申请实施例中,服务器向第二客户端发送至少一个第三神经网络,或者,用于构建至少一个第三神经网络的神经网络模块。
1303、第二客户端通过至少一个第三神经网络,生成待处理数据的预测结果。
本申请实施例中,第二客户端可以从至少一个第三神经网络中随机选取一个第三神经网络,也可以根据第二数据集合,从至少一个第三神经网络中选取与第二数据集合适配度最高的等,进而通过选取出的一个第三神经网络,生成待处理数据的预测结果。
本申请实施例中,不仅在训练阶段可以结合各个客户端中存储的数据集合的数据特性 来对神经网络执行训练操作,也即不仅在训练阶段可以实现神经网络的个性化定制,在推理阶段,也可以实现神经网络的个性化分配,保持了训练阶段和推理阶段的连贯性,且有利于提高推理阶段的准确率。
在上述各个实施例的基础上,由于需要利用客户端上的数据来训练神经网络,为了提高用户数据的安全度。本申请实施例还提供了在执行训练操作之前,对客户端上的数据进行加密的方法,请参阅如下描述。
实施例1:基于梯度的模块并进行打包处理的预训练方案
客户端A拥有特征集合F
A={f
1,f
2,...,f
N},客户端B拥有特征集合F
B={f
N+1,f
N+2,...,f
N+M},客户A的数据为D
A={d
1A,d
2A,d
3A,...,d
PA},客户端B拥有数据D
B={d
1B,d
2B,d
3B,...,d
PB}。d
pA的数据的特征为F
A,d
pB的数据的特征为F
B,d
p=[d
pA,d
pB]表示第p条数据的全部特征值,客户端B用户数据标签L={l
1,l
2,l
3,...,l
P},客户端A拥有的模型参数为W
A,客户端B拥有的模型参数为W
B,客户端A对应的模型梯度为G
A,客户端B对应的模型梯度为G
B。
训练过程:
第1步:客户端A生成半全同态加密的公钥pk
A和私钥sk
A。
第2步:客户端B生成全同态加密的公钥pk
B和私钥sk
B。
第3步:客户端A将其公钥pk
A发送给客户端B,客户端B将其公钥pk
B发送给客户端A。
第4步:客户端A利用其拥有的模型参数W
A以及其拥有的数据D
A计算U
A。客户端A对U
A进行打包操作得到DU
A。户端A利用客户端A的公钥pk
A对DU
A进行同态加密,得到加密后的[[DU
A]]_pk
A,并将其发送给客户端B。
打包指的是将数据U
A=[u
A1,u
A2,u
A3,...,u
AP,]按指定包长L分割成小的数据包
DU
A=[Du
A1,Du
A2,...,Du
AP/L,],其中,
Du
A1=[u
A1,u
A2,...,u
AL],Du
A2=[u
AL+1,u
AL+2,...,u
AL+L]。
公钥pk
A对DU
A进行同态加密指的是用公钥pk
A分别对Du
A1,Du
A2,...,Du
AP/L进行加密。
第5步:客户端B利用其拥有的模型参数W
B以及其拥有的数据D
B以及标签L,计算U
B-L=U
B-L客户端B对U
B-L进行打包,得到DU
B-L客户端B用客户端B的公钥pk
B对打包后进行加密得到[[DU
B-L]]_pk
B,并将其发送给客户端A。
第6步:客户端A将自己DU
A的利用客户端B的公钥pk
B进行加密得到[[DU
A]]_pk
B将其与从客户端B得到的利用客户端B公钥加密的DU
B-L即[[DU
B-L]]_pk
B相加,与编码后的数据集D
A相乘得到经同态加密的模型对应的梯度[[G
A]]。生成与W
A的维度*打包长度相同大小的W
A_Noise保存。对W
A_Noise进行打包得到DW
A_Noise并利用客户端B的公钥加密打包后的 DW
A_Noise得到[[DW
A_Noise]],将前面得到的同态加密的模型梯度与[DW
A_Noise]]相加,得到带噪声且同态加密的模型梯度。将其发送给客户端B使用客户端B的私钥进行解密后发送回客户端A,客户端A利用解密后的带噪声梯度值,减去保存的噪声W
A_Noise并沿打包维度进度累加得到真实的模型梯度,更新模型参数W
A。
第7步:客户端B将自己的DU
B-L利用客户端A的公钥pk
A进行加密得到[[DU
B-L]]_pk
A,将其与从客户端A得到的利用客户端A公钥加密的DU
A即[[DU
A]]_pk
A相加,与编码后的D
B相乘得到经同态加密的模型W
B对应的梯度。生成与W
B维度*打包长度相同大小的W
B_Noise保存。对W
B_Noise进行打包得到DW
B_Noise,并利用客户端A的公钥加密DW
A_Noise得到[[DW
A_Noise]],将前面得到的同态加密的模型参数与[[DW
B_Noise]]相加,得到带噪声且同态加密的模型梯度[[G
B]]。将其发送给客户端A使用客户端A的私钥进行解密后发送回客户端B,客户端B利用解密后的带噪声梯度值,减去保存的噪声W
B_Noise并沿打包维度进度累加得到真实的模型梯度,更新模型参数W
B。
第8步:判断是否达到收敛条件,如果达到则结束训练过程,否则继续回到第4步继续执行。
推理过程:
对于拟合和分类问题可以使用:
客户端A和B分布计算U
A和U
B并有一方计算U
A+U
B的值。
对于分类问题可以使用:
客户端A和B各自计算U
A和-U
B。
客户端A和B按预制的小数点前固定位数对数据进行补0得到:IU
A和-IU
B
例如U
A=1234.5678,-U
B=12.3456我们预设的小数点前为6位,则
IU
A=001234.5678,-U
B=000012.3456。
客户端A和B分别每次从最高位开始按预制的位数取数比较,如果相等则在比较接下来的预制的位数的数,如果能够判断大小则停止比较并根据比较大小判断U
A和-U
B的大小;如果已经取了指定个数的数进行比较,则停止比较U
A=-U
B,判定。如预设为2位,则客户端A取IU
A的00和客户端B取-IU
B的00比较,因为相同,所以客户端A取IU
A的12和客户端B取-IU
B的00比较,由于12比00大,所以U
A大于-U
B。
取数比较的过程如下:
客户端A截取数据I
a,客户端B截取数据I
b。
客户端B生产公钥-私钥对并将公钥发送给客户端A。
客户端A生产一个随机整数RIntX并使用客户端B发来的公钥加密,得到[[RIntX]],客户端A将[[RIntX]]-I
a发送给B。
客户端B对收到的[[RIntX]]-I
a分别加0到99,然后解密得到 DRIntX=[DRIntX
0,DRIntX
1,DRIntX
2,...DRIntX
99,],然后客户端B对DRIntX第I
b位置的数据-1,对DRIntX位置大于I
b的-2,并对DRIntX中的每个数按预置的模数取余并将结果发送给A。
客户端A对自己的RIntX按照预置的与B相同的模数取余,然后跟收到的DRIntX的第I
a位置的数据相比较,如果相等,则说明I
a<I
b,如果取模相差为1则I
a=I
b,如果取模相差为2则I
a>I
b。
如果U
A大于-U
B则U
A+U
B>0;如果U
A小于-U
B则U
A+U
B<0;如果U
A等于-U
B则U
A+U
B=0。
实施例2:基于分类树的模块预训练方案
模块可以采用预训练的方式实施,同时可以进行多用户不同特征的联合学习。
客户端A拥有特征集合F
A={f
1,f
2,...,f
N},客户端B拥有特征集合F
B={f
N+1,f
N+2,...,f
N+M},客户A的数据为D
A={d
1A,d
2A,d
3A,...,d
PA},客户端B拥有数据D
B={d
1B,d
2B,d
3B,...,d
PB}。d
pA的数据的特征为F
A,d
pB的数据的特征为F
B,d
p=[d
pA,d
pB]表示第p条数据的全部特征值,客户端B用户数据标签L={l
1,l
2,l
3,...,l
P},l
p=0表示类0,l
p=1表示类1。
训练过程:
第1步:客户端生成半同态加密(全同态加密)的公钥pk
B和私钥sk
B,使用公钥pk
B加密数据标签L得到加密的数据pk
B(L)={pk
B(l
1),pk
B(l
2),pk
B(l
3),...,pk
B(l
P)}。
第2步:客户端B将公钥pk
B和加密后的标签pk
B(L)发送给客户端A,设置节点编号h=0,全部数据都属于节点h。B输出话推理树为空,A和B初始化分割树为空。
第3步:客户端A根据本地数据生成特征切割方案集合S
A={s
1A,s
2A,s
3A,...,s
IA},根据分割策略s
iA将属于节点h的数据分成左右两个子节点2*h和2*h+1。计算子节点2*h和2*h+1的加密的数据标签的和:
以及两个集合中数据的个数:
客户端B根据本地数据生成特征切割方案集合S
B={s
1B,s
2B,s
3B,...,s
IB},根据分割策 略s
iB将属于节点h的数据分成左右两个子节点2*h和2*h+1。计算子节点2*h和2*h+1的数据标签的和:
以及两个集合中数据的个数:
第6步:客户端A使用
和
和
和
计算各种分割下的gini系数,并选择最小的gini系数对应的切割方案记为s
minA以及基尼系数值gini
minA,客户端B使用
和
和
和
计算各种分割下的gini系数,并选择最小的gini系数对应的切割方案记为s
minB以及基尼系数值gini
minB。
第7步:客户端A发送gini
minA给客户端B,客户端B比较大小,并将比较结果返回给A。B推理树第h个节点标记为gini值小的一方的编号。
第8步:gini小的一方根据对应的数据分割方案,把数据进行分割并将分割结果发送给对方。并将分割策略写入分割树的第h个节点。
第9步:h=h+1重复第3步到7步直到指定的重复步数。
第10步:B统计叶子节点哪个类别多则标记该叶子节点为该类。
推理过程:
步骤1:根据推理树,选择处理方是A和B,选择处理方。
步骤2:根据分割树的分割策略,选择下一个节点的位置。
重复1和2直到叶子节点,分类结果为该类标记的类别。
实施例3:基于回归树的模块预训练方案
模块可以采用预训练的方式实施,同时可以进行多用户不同特征的联合学习。
客户端A拥有特征集合F
A={f
1,f
2,...,f
N},客户端B拥有特征集合F
B={f
N+1,f
N+2,...,f
N+M},客户A的数据为D
A={d
1A,d
2A,d
3A,...,d
PA},客户端B拥有数据D
B={d
1B,d
2B,d
3B,...,d
PB}。d
pA的数据的特征为F
A,d
pB的数据的特征为F
B,d
p=[d
pA,d
pB]表示第p条数据的全部特征值,客户端B用户数据标签L={l
1,l
2,l
3,...,l
P}。
训练过程:
第1步:客户端生成半同态加密(全同态加密)的公钥pk
B和私钥sk
B,使用公钥pk
B加密数据标签L得到加密的数据pk
B(L)={pk
B(l
1),pk
B(l
2),pk
B(l
3),...,pk
B(l
P)}和数据标签的平方值
第2步:客户端B将公钥pk
B和加密后的标签pk
B(L)发送给客户端A,设置节点编号h=0,全部数据都属于节点h。B输出话推理树为空,A和B初始化分割树为空。
第3步:客户端A根据本地数据生成特征切割方案集合S
A={s
1A,s
2A,s
3A,...,s
IA},根据分割策略s
iA将属于节点h的数据分成左右两个子节点2*h和2*h+1。计算子节点2*h和2*h+1的加密的数据标签的和:
以及两个集合中数据的个数:
客户端B根据本地数据生成特征切割方案集合S
B={s
1B,s
2B,s
3B,...,s
IB},根据分割策略s
iB将属于节点h的数据分成左右两个子节点2*h和2*h+1。计算子节点2*h和2*h+1的数据标签的和:
以及两个集合中数据的个数:
并计算方差值:
第10步:客户端A和B分别选择方差最小的划分记为s
minA以及方差值var
minA和s
minB以及方差值var
minB。
第11步:客户端A发送var
minA给客户端B,客户端B比较大小,并将比较结果返回给A。B推理树第h个节点标记为gini值小的一方的编号。
第12步:方差小的一方根据对应的数据分割方案,把数据进行分割并将分割结果发送给对方,并将分割策略写入分割树的第h个节点。
第13步:h=h+1重复第3到7直到指定的重复步数。
第14步:B统计叶子节点哪个类别多则标记该叶子节点为该类。
推理过程:
步骤1:根据推理树,选择处理方是A和B,选择处理方。
步骤2:根据分割树的分割策略,选择下一个节点的位置。
重复1和2直到叶子节点,分类结果为该类标记的类别。
在图1至图13所对应的实施例的基础上,为了更好的实施本申请实施例的上述方案,下面还提供用于实施上述方案的相关设备。具体参阅图14,图14为本申请实施例提供的机器学习模型的训练装置的一种结构示意图。机器学习模型的训练装置1400应用于第一客户端,多个客户端与服务器通信连接,服务器中存储有多个模块,多个模块用于构建机器学习模型,第一客户端为多个客户端中的任一客户端,机器学习模型的训练装置1400用于执行多轮迭代,机器学习模型的训练装置1400包括:获取单元1401、训练单元1402和发送单元1403,在多轮迭代中的一轮迭代中,获取单元1401,用于获取至少一个第一机器学习模型,至少一个第一机器学习模型为根据机器学习模型的训练装置存储的第一训练数据集合的数据特性选取出来的;训练单元1402,用于利用第一数据集合对至少一个第一机器学习模型执行训练操作,得到至少一个训练后的第一机器学习模型;发送单元1403,用于将至少一个训练后的第一机器学习模型包括的至少一个更新后的模块发送给服务器,更新后的模块用于供服务器更新存储的模块的权重参数。
本申请实施例中,能够对不同数据特性的训练数据分配不同的神经网络,也即实现了神经网络与数据特性之间的个性化匹配;此外,由于第一客户端为多个客户端中的任一客户端,对多个客户端中的每个客户端均根据客户端存储的训练数据集合的数据特性分配并训练神经网络,能够利用相同数据特性的训练数据训练相同的神经网络,不同数据特性的训练数据训练不同的神经网络,从而不仅实现了神经网络与数据特性之间的个性化匹配,而且有利于提高训练后神经网络的准确率。
在一种可能的设计中,多个模块用于构建至少两个第二机器学习模型,至少一个第一机器学习模型为从至少两个第二机器学习模型中选取出来的;或者,用于构建至少一个第一机器学习模型的模块为从多个模块中选取出来的。
在一种可能的设计中,请参阅图15,图15为本申请实施例提供的机器学习模型的训练装置的一种结构示意图。机器学习模型为神经网络,服务器中存储的多个模块为神经网络模块,机器学习模型的训练装置1400上存储有第一适配关系,第一适配关系包括多个适配值,适配值用于表示第一数据集合与第二神经网络之间的适配程度;装置1400还包括:接收单元1404,用于接收服务器发送的多个神经网络模块;获取单元1401,具体用于根据第一适配关系,从至少两个第二神经网络中选取至少一个第一神经网络,至少一个第一神经网络中包括与第一数据集合的适配值高的至少一个第一神经网络。
在一种可能的设计中,第一数据集合与一个第二神经网络之间的适配值与第一损失函数的函数值对应,第一损失函数的函数值越小,第一数据集合与一个第二神经网络之间的适配值越大;其中,第一损失函数指示第一训练数据的预测结果与第一数据的正确结果之间的相似度,第一数据的预测结果通过一个第二神经网络得到,第一数据和第一数据的正确结果基于第一数据集合得到。
在一种可能的设计中,第一数据集合与一个第二神经网络之间的适配值与第一相似度对应,第一相似度越大,第一数据集合与一个第二神经网络之间的适配值越大;其中,第一相似度指的是一个第二神经网络和第三神经网络之间的相似度,第三神经网络为上一轮迭代中输出预测结果的准确率最高的神经网络。
在一种可能的设计中,一个第二神经网络和第三神经网络之间的相似度通过以下任一种方式确定:将相同数据分别输入至一个第二神经网络和第三神经网络,并对比一个第二神经网络的输出数据与第三神经网络的输出数据之间的相似度;或者,计算一个第二神经网络的权重参数矩阵和第三神经网络的权重参数矩阵之间的相似度。
在一种可能的设计中,请参阅图15,机器学习模型为神经网络,装置1400还包括:接收单元1404和输入单元1405;接收单元1404,用于接收服务器发送的选择器,选择器为用于从多个神经网络模块中选取与第一数据集合的数据特征匹配的至少一个神经网络模块的神经网络;输入单元1405,用于根据第一数据集合,将训练数据输入至选择器,得到选择器输出的指示信息,指示信息包括多个神经网络模块中每个神经网络模块被选中的概率,用于指示构建至少一个第一神经网络的神经网络模块;接收单元1404,还用于从服务端接收用于构建至少一个第一神经网络的神经网络模块。
在一种可能的设计中,请参阅图15,机器学习模型为神经网络,服务器中存储的多个模块为神经网络模块,装置1400还包括:计算单元1406,用于计算第一数据集合与至少一个第一神经网络中每个第一神经网络之间的适配值;其中,第一数据集合包括多个第一训练数据,第一训练数据与第一神经网络之间的适配值越高,在利用第一训练数据对第一神经网络进行一次训练的过程中,对第一神经网络的权重参数的修改程度越大。
在一种可能的设计中,请参阅图15,计算单元1406,具体用于:对第一数据集合进行聚类,得到至少两个数据子集合,第一数据子集合为第一数据集合的子集,第一数据子集合为至少两个数据子集合中的任一个;根据第一数据子集合和第一损失函数,生成第一数据子集合与一个第一神经网络之间的适配值,第一损失函数的函数值越小,第一数据子集合与一个第一神经网络之间的适配值越大;其中,第一损失函数指示第一训练数据的预测 结果与第一数据的正确结果之间的相似度,第一数据的预测结果通过一个第一神经网络得到,第一数据和第一数据的正确结果基于第一数据子集合得到,第一数据子集合与一个第一神经网络之间的适配值被确定为第一数据子集合中每个数据与一个第一神经网络之间的适配值。
在一种可能的设计中,机器学习模型为神经网络,服务器中存储的多个模块为神经网络模块;训练单元1402,具体用于根据第二损失函数,利用第一数据集合对第一神经网络执行训练操作;其中,第一数据集合包括多个第一训练数据,第二损失函数指示第一预测结果与第一训练数据的正确结果之间的相似度,还指示第一预测结果与第二预测结果之间的相似度,第一预测结果为将第一训练数据输入第一神经网络后,由第一神经网络输出的第一训练数据的预测结果,第二预测结果为将第一训练数据输入第四神经网络后,由第四神经网络输出的第一训练数据的预测结果,第四神经网络为未执行过训练操作的第一神经网络。
在一种可能的设计中,请参阅图15,第一数据集合包括多个第一训练数据和每个第一训练数据的正确结果;接收单元1404,还用于接收服务器发送的选择器,选择器为用于从多个神经网络模块中选取与第一数据集合的数据特征匹配的至少一个第一神经网络模块的神经网络;训练单元1402,具体用于:将第一训练数据输入选择器,得到选择器输出的指示信息,指示信息包括多个神经网络模块中每个神经网络模块被选中的概率,用于指示构建第一神经网络的神经网络模块;根据多个神经网络模块,指示信息和第一训练数据,得到第一神经网络输出的第一训练数据的预测结果;根据第三损失函数,对第一神经网络和选择器执行训练操作,其中,第三损失函数指示第一训练数据的预测结果与正确结果之间的相似度,还指示该指示信息的离散程度;发送单元1403,还用于向服务器发送训练后的选择器。
需要说明的是,机器学习模型的训练装置1400中各模块/单元之间的信息交互、执行过程等内容,与本申请中图4至图13对应的各个方法实施例基于同一构思,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。
本申请实施例还提供一种机器学习模型的训练装置,具体参阅图16,图16为本申请实施例提供的机器学习模型的训练装置的一种结构示意图。机器学习模型的训练装置1600应用于服务器,服务器与多个客户端通信连接,服务器中存储有多个模块,多个模块用于构建机器学习模型,第一客户端为多个客户端中的任一客户端,机器学习模型的训练装置1600用于执行多轮迭代,机器学习模型的训练装置1600包括:获取单元1601、发送单元1602和更新单元1603,在多轮迭代中的一轮迭代中,获取单元1601,用于获取与第一客户端对应的至少一个第一机器学习模型,第一客户端为多个客户端中的一个客户端,至少一个第一机器学习模型与第一客户端存储的第一数据集合的数据特性对应;发送单元1602,用于将至少一个第一机器学习模型发送给第一客户端,至少一个第一机器学习模型指示第一客户端利用第一数据集合对至少一个第一机器学习模型执行训练操作,得到至少一个训练后的第一机器学习模型;更新单元1603,用于从第一客户端接收至少一个训练后的第一机器学习模型包括的至少一个更新后的神经网络模块,并根据至少一个更新后的神经网络 模块更新存储的神经网络模块的权重参数。
本申请实施例中,通过上述方式,能够对不同数据特性的训练数据分配不同的神经网络,也即实现了神经网络与数据特性之间的个性化匹配;由于第一客户端为多个客户端中的任一客户端,对多个客户端中的每个客户端均根据客户端存储的训练数据集合的数据特性分配并训练神经网络,能够利用相同数据特性的训练数据训练相同的神经网络,不同数据特性的训练数据训练不同的神经网络,从而不仅实现了神经网络与数据特性之间的个性化匹配,而且有利于提高训练后神经网络的准确率;由服务器选择与各个客户端适配的神经网络,既避免了将所有神经外网络模块发送给客户端,以减少对客户端存储资源的浪费;且避免了对客户端计算机资源的占用,有利于提高用户体验。
在一种可能的设计中,多个模块用于构建至少两个第二机器学习模型,至少一个第一机器学习模型为从至少两个第二机器学习模型中选取出来的;或者,用于构建至少一个第一机器学习模型的模块为从多个模块中选取出来的。
在一种可能的设计中,请参阅图17,图17为本申请实施例提供的机器学习模型的训练装置的一种结构示意图。机器学习模型为神经网络,机器学习模型的训练装置1600中存储的多个模块为神经网络模块,机器学习模型的训练装置1600上存储有第二适配关系,第二适配关系中包括多个适配值,适配值用于表示客户端中存储的训练数据与第二神经网络之间的适配程度,装置1600还包括:接收单元1604,用于接收第一客户端发送的第一数据集合与至少一个第二神经网络之间的适配值,并更新第二适配关系;获取单元1601,具体用于根据第二适配关系,从多个第二神经网络中选取至少一个第一神经网络,至少一个第一神经网络包括与第一数据集合的适配值高的神经网络。
在一种可能的设计中,请参阅图17,机器学习模型为神经网络,机器学习模型的训练装置1600中存储的多个模块为神经网络模块,装置1600还包括:接收单元1604,用于接收第一客户端发送的第一标识信息,第一标识信息为第一神经网络的标识信息,或者,第一标识信息为构建第一神经网络的神经网络模块的标识信息;发送单元1602,具体用于向第一客户端发送第一标识信息指向的第一神经网络,或者,向第一客户端发送第一标识信息指向的构建第一神经网络的神经网络模块。
在一种可能的设计中,请参阅图17,机器学习模型为神经网络,机器学习模型的训练装置1600中存储的多个模块为神经网络模块,机器学习模型的训练装置1600还配置有选择器,装置还包括:接收单元1604,用于接收第一客户端发送的至少一个类中心,对第一数据集合执行聚类操作后,得到至少一个数据子集合,至少一个类中心中的一个类中心为至少一个数据子集合中一个数据子集合的类中心;获取单元1601,具体用于将类中心分别输入选择器,得到选择器输出的指示信息,并根据指示信息,确定构建至少一个第一神经网络的神经网络模块,指示信息包括多个神经网络模块中每个神经网络模块被选中的概率;发送单元1602,具体用于将构建至少一个第一神经网络的神经网络模块发送给第一客户端。
在一种可能的设计中,请参阅图17,机器学习模型为神经网络,机器学习模型的训练装置1600中存储的多个模块为神经网络模块,一个神经网络被分为至少两个子模块,机器学习模型的训练装置1600存储的神经网络模块被分为与至少两个子模块对应的至少两个 组,同一组中不同的神经网络模块的功能相同,根据至少一个更新后的神经网络模块更新存储的神经网络模块的权重参数之后,装置1600还包括:计算单元1605,用于计算同一组包括的至少两个神经网络模块中不同的神经网络模块之间的相似度,并将相似度大于预设阈值的两个神经网络模块进行合并。
在一种可能的设计中,不同的神经网络模块包括第二神经网络模块和第一神经网络模块,第二神经网络模块和第一神经网络模块之间的相似度通过以下任一种方式确定:将相同数据分别输入至第二神经网络模块和第一神经网络模块,并对比第二神经网络模块的输出数据与第一神经网络模块的输出数据之间的相似度;或者,计算第二神经网络模块的权重参数矩阵和第一神经网络模块的权重参数矩阵之间的相似度。
需要说明的是,机器学习模型的训练装置1600中各模块/单元之间的信息交互、执行过程等内容,与本申请中图4至图13对应的各个方法实施例基于同一构思,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。
本申请实施例还提供一种服务器,请参阅图18,图18为本申请实施例提供的服务器的一种结构示意图。服务器1800上可以部署有图14和图15对应实施例中所描述的机器学习模型的训练装置1400,用于实现图4至图13对应实施例中第一客户端的功能。或者,在第一客户端配置于服务器形态的设备中时,服务器1800上可以部署有图16和图17对应实施例中所描述的机器学习模型的训练装置1600,用于实现图4至图13对应实施例中服务器的功能。具体的,服务器1800可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units,CPU)1822(例如,一个或一个以上处理器)和存储器1832,一个或一个以上存储应用程序1842或数据1844的存储介质1830(例如一个或一个以上海量存储设备)。其中,存储器1832和存储介质1830可以是短暂存储或持久存储。存储在存储介质1830的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对服务器中的一系列指令操作。更进一步地,中央处理器1822可以设置为与存储介质1830通信,在服务器1800上执行存储介质1830中的一系列指令操作。
服务器1800还可以包括一个或一个以上电源1826,一个或一个以上有线或无线网络接口1850,一个或一个以上输入输出接口1858,和/或,一个或一个以上操作系统1841,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。
在一种情况下,本申请实施例中,在中央处理器1822用于执行图4至图13对应实施例中的第一客户端执行的机器学习模型的训练方法,具体的,机器学习模型的训练包括多轮迭代,在多轮迭代中的一轮迭代中,中央处理器1822具体用于:
获取至少一个第一机器学习模型,至少一个第一机器学习模型为根据第一客户端存储的第一数据集合的数据特性选取出来的;利用第一数据集合对至少一个第一机器学习模型执行训练操作,得到至少一个训练后的第一机器学习模型;将至少一个训练后的第一机器学习模型包括的至少一个更新后的模块发送给服务器,更新后的模块用于供所述服务器更新存储的模块的权重参数。
中央处理器1822还用于执行图4至图13中第一客户端执行的其他步骤,对于中央处理器1422执行图4至图13对应实施例中的第一客户端执行的步骤的具体实现方式以及带 来的有益效果,均可以参考图4至图13对应的各个方法实施例中的叙述,此处不再一一赘述。
在另一种情况下,本申请实施例中,在中央处理器1822用于执行图4至图13对应实施例中的服务器执行的机器学习模型的训练方法,具体的,机器学习模型的训练包括多轮迭代,在多轮迭代中的一轮迭代中,中央处理器1822具体用于:
获取与第一客户端对应的至少一个第一机器学习模型,第一客户端为多个客户端中的一个客户端,至少一个第一机器学习模型与第一客户端存储的第一数据集合的数据特性对应;将至少一个第一机器学习模型发送给第一客户端,至少一个第一机器学习模型指示第一客户端利用第一数据集合对至少一个第一机器学习模型执行训练操作,得到至少一个训练后的第一机器学习模型;从第一客户端接收至少一个训练后的第一机器学习模型包括的至少一个更新后的神经网络模块,并根据至少一个更新后的神经网络模块更新存储的神经网络模块的权重参数。
中央处理器1822还用于执行图4至图13中服务器执行的其他步骤,对于中央处理器1422执行图4至图13对应实施例中的服务器执行的步骤的具体实现方式以及带来的有益效果,均可以参考图4至图13对应的各个方法实施例中的叙述,此处不再一一赘述。
本申请实施例还提供一种终端设备,请参阅图19,图19为本申请实施例提供的终端设备的一种结构示意图。其中,在客户端配置于移动设备形态的设备上时,终端设备1900上可以部署有图14和图15对应实施例中所描述的机器学习模型的训练装置1400,用于实现图4至图13对应实施例中第一客户端的功能。具体的,终端设备1900包括:接收器1901、发射器1902、处理器1903和存储器1904(其中终端设备1900中的处理器1903的数量可以一个或多个,图19中以一个处理器为例),其中,处理器1903可以包括应用处理器19031和通信处理器19032。在本申请的一些实施例中,接收器1901、发射器1902、处理器1903和存储器1904可通过总线或其它方式连接。
存储器1904可以包括只读存储器和随机存取存储器,并向处理器1903提供指令和数据。存储器1904的一部分还可以包括非易失性随机存取存储器(non-volatile random access memory,NVRAM)。存储器1904存储有处理器和操作指令、可执行模块或者数据结构,或者它们的子集,或者它们的扩展集,其中,操作指令可包括各种操作指令,用于实现各种操作。
处理器1903控制终端设备的操作。具体的应用中,终端设备的各个组件通过总线系统耦合在一起,其中总线系统除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都称为总线系统。
上述本申请实施例揭示的方法可以应用于处理器1903中,或者由处理器1903实现。处理器1903可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器1903中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1903可以是通用处理器、数字信号处理器(digital signal processing,DSP)、微处理器或微控制器,还可进一步包括专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)或者其他可编程逻辑器件、分 立门或者晶体管逻辑器件、分立硬件组件。该处理器1903可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1904,处理器1903读取存储器1904中的信息,结合其硬件完成上述方法的步骤。
接收器1901可用于接收输入的数字或字符信息,以及产生与终端设备的相关设置以及功能控制有关的信号输入。发射器1902可用于通过第一接口输出数字或字符信息;发射器1902还可用于通过第一接口向磁盘组发送指令,以修改磁盘组中的数据;发射器1902还可以包括显示屏等显示设备。
本申请实施例中,应用处理器19031,用于执行图4至图13对应实施例中第一客户端的功能。需要说明的是,对于应用处理器19031执行图4至图13对应实施例中第一客户端的功能的具体实现方式以及带来的有益效果,均可以参考图4至图13对应的各个方法实施例中的叙述,此处不再一一赘述。
本申请实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有程序,当其在计算机上运行时,使得计算机执行如前述图4至图13所示实施例描述的方法中第一客户端所执行的步骤;或者,使得计算机执行如前述图4至图13所示实施例描述的方法中服务器所执行的步骤。
本申请实施例中还提供一种包括计算机程序产品,当其在计算机上运行时,使得计算机执行如前述图4至图13所示实施例描述的方法中第一客户端所执行的步骤,或者,使得计算机执行如前述图4至图13所示实施例描述的方法中服务器所执行的步骤。
本申请实施例中还提供一种电路系统,所述电路系统包括处理电路,所述处理电路配置为执行如前述图4至图13所示实施例描述的方法中第一客户端所执行的步骤,或者,所述处理电路配置为执行如前述图4至图13所示实施例描述的方法中服务器所执行的步骤。
本申请实施例提供的机器学习模型的训练装置、客户端和服务器具体可以为芯片,芯片包括:处理单元和通信单元,所述处理单元例如可以是处理器,所述通信单元例如可以是输入/输出接口、管脚或电路等。该处理单元可执行存储单元存储的计算机执行指令,以使芯片执行上述图4至图13所示实施例描述的神经网络的训练的方法。可选地,所述存储单元为所述芯片内的存储单元,如寄存器、缓存等,所述存储单元还可以是所述无线接入设备端内的位于所述芯片外部的存储单元,如只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。
具体的,请参阅图20,图20为本申请实施例提供的芯片的一种结构示意图,所述芯片可以表现为神经网络处理器NPU 200,NPU 200作为协处理器挂载到主CPU(Host CPU)上,由Host CPU分配任务。NPU的核心部分为运算电路2003,通过控制器2004控制运算电路2003提取存储器中的矩阵数据并进行乘法运算。
在一些实现中,运算电路2003内部包括多个处理单元(Process Engine,PE)。在一些实现中,运算电路2003是二维脉动阵列。运算电路2003还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路2003是通用的矩阵处理器。
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路从权重存储器2002中取矩阵B相应的数据,并缓存在运算电路中每一个PE上。运算电路从输入存储器2001中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器(accumulator)2008中。
统一存储器2006用于存放输入数据以及输出数据。权重数据直接通过存储单元访问控制器(Direct Memory Access Controller,DMAC)2005,DMAC被搬运到权重存储器2002中。输入数据也通过DMAC被搬运到统一存储器2006中。
BIU为Bus Interface Unit即,总线接口单元2010,用于AXI总线与DMAC和取指存储器(Instruction Fetch Buffer,IFB)2009的交互。
总线接口单元2010(Bus Interface Unit,简称BIU),用于取指存储器2009从外部存储器获取指令,还用于存储单元访问控制器2005从外部存储器获取输入矩阵A或者权重矩阵B的原数据。
DMAC主要用于将外部存储器DDR中的输入数据搬运到统一存储器2006或将权重数据搬运到权重存储器2002中或将输入数据数据搬运到输入存储器2001中。
向量计算单元2007包括多个运算处理单元,在需要的情况下,对运算电路的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。主要用于神经网络中非卷积/全连接层网络计算,如Batch Normalization(批归一化),像素级求和,对特征平面进行上采样等。
在一些实现中,向量计算单元2007能将经处理的输出的向量存储到统一存储器2006。例如,向量计算单元2007可以将线性函数和/或非线性函数应用到运算电路2003的输出,例如对卷积层提取的特征平面进行线性插值,再例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元2007生成归一化的值、像素级求和的值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路2003的激活输入,例如用于在神经网络中的后续层中的使用。
控制器2004连接的取指存储器(instruction fetch buffer)2009,用于存储控制器2004使用的指令;统一存储器2006,输入存储器2001,权重存储器2002以及取指存储器2009均为On-Chip存储器。外部存储器私有于该NPU硬件架构。
其中,循环神经网络中各层的运算可以由运算电路2003或向量计算单元2007执行。
其中,上述任一处提到的处理器,可以是一个通用中央处理器,微处理器,ASIC,或一个或多个用于控制上述第一方面方法的程序执行的集成电路。
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际 的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CLU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、ROM、RAM、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。
Claims (40)
- 一种机器学习模型的训练方法,其特征在于,所述方法应用于第一客户端,多个客户端与服务器通信连接,所述服务器中存储有多个模块,所述多个模块用于构建机器学习模型,所述第一客户端为所述多个客户端中的任一客户端,所述机器学习模型的训练包括多轮迭代,所述多轮迭代中的一轮迭代包括:获取至少一个第一机器学习模型,所述至少一个第一机器学习模型为根据所述第一客户端存储的第一数据集合的数据特性选取出来的;利用所述第一数据集合对所述至少一个第一机器学习模型执行训练操作,得到至少一个训练后的第一机器学习模型;将所述至少一个训练后的第一机器学习模型包括的至少一个更新后的模块发送给所述服务器,所述更新后的模块用于供所述服务器更新存储的模块的权重参数。
- 根据权利要求1所述的方法,其特征在于,所述多个模块用于构建至少两个第二机器学习模型,所述至少一个第一机器学习模型为从所述至少两个第二机器学习模型中选取出来的;或者,用于构建所述至少一个第一机器学习模型的模块为从所述多个模块中选取出来的。
- 根据权利要求2所述的方法,其特征在于,机器学习模型为神经网络,所述服务器中存储的多个模块为神经网络模块,所述第一客户端上存储有第一适配关系,所述第一适配关系包括多个适配值,所述适配值用于表示所述第一数据集合与第二神经网络之间的适配程度;所述获取至少一个第二机器学习模块之前,所述方法还包括:接收所述服务器发送的所述多个神经网络模块;所述获取至少一个机器学习模型包括:根据所述第一适配关系,从所述至少两个第二神经网络中选取至少一个第一神经网络,所述至少一个第一神经网络中包括与所述第一数据集合的适配值高的神经网络。
- 根据权利要求3所述的方法,其特征在于,所述第一数据集合与一个第二神经网络之间的适配值与第一损失函数的函数值对应,所述第一损失函数的函数值越小,所述第一数据集合与所述一个第二神经网络之间的适配值越大;其中,所述第一损失函数指示第一数据的预测结果与所述第一数据的正确结果之间的相似度,所述第一数据的预测结果通过所述一个第二神经网络得到,所述第一数据和所述第一数据的正确结果基于所述第一数据集合得到。
- 根据权利要求3所述的方法,其特征在于,所述第一数据集合与一个第二神经网络之间的适配值与第一相似度对应,所述第一相似度越大,所述第一数据集合与所述一个第二神经网络之间的适配值越大;其中,所述第一相似度指的是所述一个第二神经网络和第三神经网络之间的相似度,所述第三神经网络为上一轮迭代中输出预测结果的准确率最高的神经网络。
- 根据权利要求5所述的方法,其特征在于,所述一个第二神经网络和第三神经网络之间的相似度通过以下任一种方式确定:将相同数据分别输入至所述一个第二神经网络和所述第三神经网络,并对比所述一个第二神经网络的输出数据与所述第三神经网络的输出数据之间的相似度;或者,计算所述一个第二神经网络的权重参数矩阵和所述第三神经网络的权重参数矩阵之间的相似度。
- 根据权利要求1所述的方法,其特征在于,机器学习模型为神经网络,所述方法还包括:接收所述服务器发送的选择器,所述选择器为用于从所述多个神经网络模块中选取与所述第一数据集合的数据特征匹配的至少一个神经网络模块的神经网络;根据所述第一数据集合,将训练数据输入至所述选择器,得到所述选择器输出的指示信息,所述指示信息包括所述多个神经网络模块中每个神经网络模块被选中的概率,用于指示构建所述至少一个第一神经网络的神经网络模块;从所述服务端接收用于构建所述至少一个第一神经网络的神经网络模块。
- 根据权利要求1至7任一项所述的方法,其特征在于,机器学习模型为神经网络,所述服务器中存储的多个模块为神经网络模块,所述获取至少一个第一机器学习模型之后,所述方法还包括:计算所述第一数据集合与所述至少一个第一神经网络中每个第一神经网络之间的适配值;其中,所述第一数据集合包括多个第一训练数据,所述第一训练数据与所述第一神经网络之间的适配值越高,在利用所述第一训练数据对所述第一神经网络进行一次训练的过程中,对所述第一神经网络的权重参数的修改程度越大。
- 根据权利要求8所述的方法,其特征在于,所述计算所述第一数据集合与所述至少一个第一神经网络中每个第一神经网络之间的适配值,包括:对所述第一数据集合进行聚类,得到至少两个数据子集合,第一数据子集合为所述第一数据集合的子集,所述第一数据子集合为所述至少两个数据子集合中的任一个;根据所述第一数据子集合和第一损失函数,生成所述第一数据子集合与一个第一神经网络之间的适配值,所述第一损失函数的函数值越小,所述第一数据子集合与所述一个第一神经网络之间的适配值越大;其中,所述第一损失函数指示第一数据的预测结果与所述第一数据的正确结果之间的相似度,所述第一数据的预测结果通过所述一个第一神经网络得到,所述第一数据和所述第一数据的正确结果基于所述第一数据子集合得到,所述第一数据子集合与一个第一神经网络之间的适配值被确定为所述第一数据子集合中每个数据与所述一个第一神经网络之间的适配值。
- 根据权利要求1至7任一项所述的方法,其特征在于,机器学习模型为神经网络,所述服务器中存储的多个模块为神经网络模块,所述利用所述第一数据集合对所述至少一个第一机器学习模型执行训练操作,包括:根据第二损失函数,利用所述第一数据集合对所述第一神经网络执行训练操作;其中,所述第一数据集合包括多个第一训练数据,所述第二损失函数指示第一预测结 果与所述第一训练数据的正确结果之间的相似度,还指示所述第一预测结果与第二预测结果之间的相似度,所述第一预测结果为将所述第一训练数据输入所述第一神经网络后,由所述第一神经网络输出的所述第一训练数据的预测结果,所述第二预测结果为将所述第一训练数据输入第四神经网络后,由所述第四神经网络输出的所述第一训练数据的预测结果,所述第四神经网络为未执行过训练操作的第一神经网络。
- 根据权利要求7所述的方法,其特征在于,所述第一数据集合包括多个第一训练数据和每个第一训练数据的正确结果,所述方法还包括:接收所述服务器发送的选择器,所述选择器为用于从所述多个神经网络模块中选取与所述第一数据集合的数据特征匹配的至少一个第一神经网络模块的神经网络;所述利用所述第一数据集合对所述至少一个第一机器学习模型执行训练操作,包括:将所述第一训练数据输入所述选择器,得到所述选择器输出的指示信息,所述指示信息包括所述多个神经网络模块中每个神经网络模块被选中的概率,用于指示构建所述第一神经网络的神经网络模块;根据所述多个神经网络模块,所述指示信息和所述第一训练数据,得到所述第一神经网络输出的所述第一训练数据的预测结果;根据第三损失函数,对所述第一神经网络和所述选择器执行训练操作,其中,所述第三损失函数指示所述第一训练数据的预测结果与正确结果之间的相似度,还指示所述指示信息的离散程度;所述方法还包括:向所述服务器发送训练后的选择器。
- 一种机器学习模型的训练方法,其特征在于,所述方法应用于服务器,所述服务器与多个客户端通信连接,所述服务器中存储有多个模块,所述多个模块用于构建机器学习模型,所述第一客户端为所述多个客户端中的任一客户端,所述机器学习模型的训练包括多轮迭代,所述多轮迭代中的一轮迭代包括:获取与第一客户端对应的至少一个第一机器学习模型,所述第一客户端为所述多个客户端中的一个客户端,所述至少一个第一机器学习模型与所述第一客户端存储的第一数据集合的数据特性对应;将所述至少一个第一机器学习模型发送给所述第一客户端,所述至少一个第一机器学习模型指示所述第一客户端利用所述第一数据集合对所述至少一个第一机器学习模型执行训练操作,得到至少一个训练后的第一机器学习模型;从所述第一客户端接收所述至少一个训练后的第一机器学习模型包括的至少一个更新后的神经网络模块,并根据所述至少一个更新后的神经网络模块更新存储的神经网络模块的权重参数。
- 根据权利要求12所述的方法,其特征在于,所述多个模块用于构建至少两个第二机器学习模型,所述至少一个第一机器学习模型为从所述至少两个第二机器学习模型中选取出来的;或者,用于构建所述至少一个第一机器学习模型的模块为从所述多个模块中选取出来的。
- 根据权利要求13所述的方法,其特征在于,机器学习模型为神经网络,所述服务 器中存储的多个模块为神经网络模块,所述服务器上存储有第二适配关系,所述第二适配关系中包括多个适配值,所述适配值用于表示客户端中存储的训练数据与第二神经网络之间的适配程度,所述方法还包括:接收所述第一客户端发送的所述第一数据集合与至少一个第二神经网络之间的适配值,并更新所述第二适配关系;所述获取至少一个第一神经网络包括:根据所述第二适配关系,从所述多个第二神经网络中选取所述至少一个第一神经网络,所述至少一个第一神经网络包括与所述第一数据集合的适配值高的神经网络。
- 根据权利要求12所述的方法,其特征在于,机器学习模型为神经网络,所述服务器中存储的多个模块为神经网络模块,所述方法还包括:接收所述第一客户端发送的第一标识信息,所述第一标识信息为所述第一神经网络的标识信息,或者,所述第一标识信息为构建所述第一神经网络的神经网络模块的标识信息;所述将所述至少一个第一机器学习模型发送给所述第一客户端,包括:向所述第一客户端发送所述第一标识信息指向的所述第一神经网络,或者,向所述第一客户端发送所述第一标识信息指向的构建所述第一神经网络的神经网络模块。
- 根据权利要求12所述的方法,其特征在于,机器学习模型为神经网络,所述服务器中存储的多个模块为神经网络模块,所述服务器还配置有选择器,所述方法还包括:接收所述第一客户端发送的至少一个类中心,对所述第一数据集合执行聚类操作后,得到至少一个数据子集合,所述至少一个类中心中的一个类中心为所述至少一个数据子集合中一个数据子集合的类中心;所述获取与第一客户端对应的至少一个第一机器学习模型,包括:将所述类中心分别输入所述选择器,得到所述选择器输出的指示信息,并根据所述指示信息,确定构建所述至少一个第一神经网络的神经网络模块,所述指示信息包括所述多个神经网络模块中每个神经网络模块被选中的概率;所述将所述至少一个第一机器学习模型发送给所述第一客户端,包括:将构建所述至少一个第一神经网络的神经网络模块发送给所述第一客户端。
- 根据权利要求12或14所述的方法,其特征在于,机器学习模型为神经网络,所述服务器中存储的多个模块为神经网络模块,一个神经网络被分为至少两个子模块,所述服务器存储的神经网络模块被分为与所述至少两个子模块对应的至少两个组,同一组中不同的神经网络模块的功能相同,所述根据所述至少一个更新后的神经网络模块更新存储的神经网络模块的权重参数之后,所述方法还包括:计算同一组包括的至少两个神经网络模块中不同的神经网络模块之间的相似度,并将相似度大于预设阈值的两个神经网络模块进行合并。
- 根据权利要求17所述的方法,其特征在于,所述不同的神经网络模块包括第二神经网络模块和第一神经网络模块,所述第二神经网络模块和所述第一神经网络模块之间的相似度通过以下任一种方式确定:将相同数据分别输入至所述第二神经网络模块和所述第一神经网络模块,并对比所述 第二神经网络模块的输出数据与所述第一神经网络模块的输出数据之间的相似度;或者,计算所述第二神经网络模块的权重参数矩阵和所述第一神经网络模块的权重参数矩阵之间的相似度。
- 一种机器学习模型的训练装置,其特征在于,所述装置应用于第一客户端,多个客户端与服务器通信连接,所述服务器中存储有多个模块,所述多个模块用于构建机器学习模型,所述第一客户端为所述多个客户端中的任一客户端,所述机器学习模型的训练装置用于执行多轮迭代,所述机器学习模型的训练装置包括:获取单元、训练单元和发送单元,在所述多轮迭代中的一轮迭代中,所述获取单元,用于获取至少一个第一机器学习模型,所述至少一个第一机器学习模型为根据所述机器学习模型的训练装置存储的第一训练数据集合的数据特性选取出来的;所述训练单元,用于利用所述第一数据集合对所述至少一个第一机器学习模型执行训练操作,得到至少一个训练后的第一机器学习模型;所述发送单元,用于将所述至少一个训练后的第一机器学习模型包括的至少一个更新后的模块发送给所述服务器,所述更新后的模块用于供所述服务器更新存储的模块的权重参数。
- 根据权利要求19所述的装置,其特征在于,所述多个模块用于构建至少两个第二机器学习模型,所述至少一个第一机器学习模型为从所述至少两个第二机器学习模型中选取出来的;或者,用于构建所述至少一个第一机器学习模型的模块为从所述多个模块中选取出来的。
- 根据权利要求20所述的装置,其特征在于,机器学习模型为神经网络,所述服务器中存储的多个模块为神经网络模块,所述机器学习模型的训练装置上存储有第一适配关系,所述第一适配关系包括多个适配值,所述适配值用于表示所述第一数据集合与第二神经网络之间的适配程度;所述装置还包括:接收单元,用于接收所述服务器发送的所述多个神经网络模块;所述获取单元,具体用于根据所述第一适配关系,从所述至少两个第二神经网络中选取至少一个第一神经网络,所述至少一个第一神经网络中包括与所述第一数据集合的适配值高的至少一个第一神经网络。
- 根据权利要求21所述的装置,其特征在于,所述第一数据集合与一个第二神经网络之间的适配值与第一损失函数的函数值对应,所述第一损失函数的函数值越小,所述第一数据集合与所述一个第二神经网络之间的适配值越大;其中,所述第一损失函数指示第一训练数据的预测结果与所述第一数据的正确结果之间的相似度,所述第一数据的预测结果通过所述一个第二神经网络得到,所述第一数据和所述第一数据的正确结果基于所述第一数据集合得到。
- 根据权利要求21所述的装置,其特征在于,所述第一数据集合与一个第二神经网络之间的适配值与第一相似度对应,所述第一相似度越大,所述第一数据集合与所述一个第二神经网络之间的适配值越大;其中,所述第一相似度指的是所述一个第二神经网络和第三神经网络之间的相似度, 所述第三神经网络为上一轮迭代中输出预测结果的准确率最高的神经网络。
- 根据权利要求23所述的装置,其特征在于,所述一个第二神经网络和第三神经网络之间的相似度通过以下任一种方式确定:将相同数据分别输入至所述一个第二神经网络和所述第三神经网络,并对比所述一个第二神经网络的输出数据与所述第三神经网络的输出数据之间的相似度;或者,计算所述一个第二神经网络的权重参数矩阵和所述第三神经网络的权重参数矩阵之间的相似度。
- 根据权利要求19所述的装置,其特征在于,机器学习模型为神经网络,所述装置还包括:接收单元和输入单元;所述接收单元,用于接收所述服务器发送的选择器,所述选择器为用于从所述多个神经网络模块中选取与所述第一数据集合的数据特征匹配的至少一个神经网络模块的神经网络;所述输入单元,用于根据所述第一数据集合,将训练数据输入至所述选择器,得到所述选择器输出的指示信息,所述指示信息包括所述多个神经网络模块中每个神经网络模块被选中的概率,用于指示构建所述至少一个第一神经网络的神经网络模块;所述接收单元,还用于从所述服务端接收用于构建所述至少一个第一神经网络的神经网络模块。
- 根据权利要求19至25任一项所述的装置,其特征在于,机器学习模型为神经网络,所述服务器中存储的多个模块为神经网络模块,所述装置还包括:计算单元,用于计算所述第一数据集合与所述至少一个第一神经网络中每个第一神经网络之间的适配值;其中,所述第一数据集合包括多个第一训练数据,所述第一训练数据与所述第一神经网络之间的适配值越高,在利用所述第一训练数据对所述第一神经网络进行一次训练的过程中,对所述第一神经网络的权重参数的修改程度越大。
- 根据权利要求26所述的装置,其特征在于,所述计算单元,具体用于:对所述第一数据集合进行聚类,得到至少两个数据子集合,第一数据子集合为所述第一数据集合的子集,所述第一数据子集合为所述至少两个数据子集合中的任一个;根据所述第一数据子集合和第一损失函数,生成所述第一数据子集合与一个第一神经网络之间的适配值,所述第一损失函数的函数值越小,所述第一数据子集合与所述一个第一神经网络之间的适配值越大;其中,所述第一损失函数指示第一训练数据的预测结果与所述第一数据的正确结果之间的相似度,所述第一数据的预测结果通过所述一个第一神经网络得到,所述第一数据和所述第一数据的正确结果基于所述第一数据子集合得到,所述第一数据子集合与一个第一神经网络之间的适配值被确定为所述第一数据子集合中每个数据与所述一个第一神经网络之间的适配值。
- 根据权利要求19至25任一项所述的装置,其特征在于,机器学习模型为神经网络,所述服务器中存储的多个模块为神经网络模块;所述训练单元,具体用于根据第二损失函数,利用所述第一数据集合对所述第一神经网络执行训练操作;其中,所述第一数据集合包括多个第一训练数据,所述第二损失函数指示第一预测结果与所述第一训练数据的正确结果之间的相似度,还指示所述第一预测结果与第二预测结果之间的相似度,所述第一预测结果为将所述第一训练数据输入所述第一神经网络后,由所述第一神经网络输出的所述第一训练数据的预测结果,所述第二预测结果为将所述第一训练数据输入第四神经网络后,由所述第四神经网络输出的所述第一训练数据的预测结果,所述第四神经网络为未执行过训练操作的第一神经网络。
- 根据权利要求25所述的装置,其特征在于,所述第一数据集合包括多个第一训练数据和每个第一训练数据的正确结果;所述接收单元,还用于接收所述服务器发送的选择器,所述选择器为用于从所述多个神经网络模块中选取与所述第一数据集合的数据特征匹配的至少一个第一神经网络模块的神经网络;所述训练单元,具体用于:将所述第一训练数据输入所述选择器,得到所述选择器输出的指示信息,所述指示信息包括所述多个神经网络模块中每个神经网络模块被选中的概率,用于指示构建所述第一神经网络的神经网络模块;根据所述多个神经网络模块,所述指示信息和所述第一训练数据,得到所述第一神经网络输出的所述第一训练数据的预测结果;根据第三损失函数,对所述第一神经网络和所述选择器执行训练操作,其中,所述第三损失函数指示所述第一训练数据的预测结果与正确结果之间的相似度,还指示所述指示信息的离散程度;所述发送单元,还用于向所述服务器发送训练后的选择器。
- 一种机器学习模型的训练装置,其特征在于,所述装置应用于服务器,所述服务器与多个客户端通信连接,所述服务器中存储有多个模块,所述多个模块用于构建机器学习模型,所述第一客户端为所述多个客户端中的任一客户端,所述机器学习模型的训练装置用于执行多轮迭代,所述机器学习模型的训练装置包括:获取单元、发送单元和更新单元,在所述多轮迭代中的一轮迭代中,所述获取单元,用于获取与第一客户端对应的至少一个第一机器学习模型,所述第一客户端为所述多个客户端中的一个客户端,所述至少一个第一机器学习模型与所述第一客户端存储的第一数据集合的数据特性对应;所述发送单元,用于将所述至少一个第一机器学习模型发送给所述第一客户端,所述至少一个第一机器学习模型指示所述第一客户端利用所述第一数据集合对所述至少一个第一机器学习模型执行训练操作,得到至少一个训练后的第一机器学习模型;所述更新单元,用于从所述第一客户端接收所述至少一个训练后的第一机器学习模型包括的至少一个更新后的神经网络模块,并根据所述至少一个更新后的神经网络模块更新存储的神经网络模块的权重参数。
- 根据权利要求30所述的装置,其特征在于,所述多个模块用于构建至少两个第二机器学习模型,所述至少一个第一机器学习模型为从所述至少两个第二机器学习模型中选取出来的;或者,用于构建所述至少一个第一机器学习模型的模块为从所述多个模块中选取出来的。
- 根据权利要求31所述的装置,其特征在于,机器学习模型为神经网络,所述机器学习模型的训练装置中存储的多个模块为神经网络模块,所述机器学习模型的训练装置上存储有第二适配关系,所述第二适配关系中包括多个适配值,所述适配值用于表示客户端中存储的训练数据与第二神经网络之间的适配程度,所述装置还包括:接收单元,用于接收所述第一客户端发送的所述第一数据集合与至少一个第二神经网络之间的适配值,并更新所述第二适配关系;所述获取单元,具体用于根据所述第二适配关系,从所述多个第二神经网络中选取所述至少一个第一神经网络,所述至少一个第一神经网络包括与所述第一数据集合的适配值高的神经网络。
- 根据权利要求30所述的装置,其特征在于,机器学习模型为神经网络,所述机器学习模型的训练装置中存储的多个模块为神经网络模块,所述装置还包括:接收单元,用于接收所述第一客户端发送的第一标识信息,所述第一标识信息为所述第一神经网络的标识信息,或者,所述第一标识信息为构建所述第一神经网络的神经网络模块的标识信息;所述发送单元,具体用于向所述第一客户端发送所述第一标识信息指向的所述第一神经网络,或者,向所述第一客户端发送所述第一标识信息指向的构建所述第一神经网络的神经网络模块。
- 根据权利要求30所述的装置,其特征在于,机器学习模型为神经网络,所述机器学习模型的训练装置中存储的多个模块为神经网络模块,所述机器学习模型的训练装置还配置有选择器,所述装置还包括:接收单元,用于接收所述第一客户端发送的至少一个类中心,对所述第一数据集合执行聚类操作后,得到至少一个数据子集合,所述至少一个类中心中的一个类中心为所述至少一个数据子集合中一个数据子集合的类中心;所述获取单元,具体用于将所述类中心分别输入所述选择器,得到所述选择器输出的指示信息,并根据所述指示信息,确定构建所述至少一个第一神经网络的神经网络模块,所述指示信息包括所述多个神经网络模块中每个神经网络模块被选中的概率;所述发送单元,具体用于将构建所述至少一个第一神经网络的神经网络模块发送给所述第一客户端。
- 根据权利要求30或32所述的装置,其特征在于,机器学习模型为神经网络,所述机器学习模型的训练装置中存储的多个模块为神经网络模块,一个神经网络被分为至少两个子模块,所述机器学习模型的训练装置存储的神经网络模块被分为与所述至少两个子模块对应的至少两个组,同一组中不同的神经网络模块的功能相同,所述装置还包括:计算单元,用于计算同一组包括的至少两个神经网络模块中不同的神经网络模块之间 的相似度,并将相似度大于预设阈值的两个神经网络模块进行合并。
- 根据权利要求35所述的装置,其特征在于,所述不同的神经网络模块包括第二神经网络模块和第一神经网络模块,所述第二神经网络模块和所述第一神经网络模块之间的相似度通过以下任一种方式确定:将相同数据分别输入至所述第二神经网络模块和所述第一神经网络模块,并对比所述第二神经网络模块的输出数据与所述第一神经网络模块的输出数据之间的相似度;或者,计算所述第二神经网络模块的权重参数矩阵和所述第一神经网络模块的权重参数矩阵之间的相似度。
- 一种训练设备,其特征在于,包括处理器,所述处理器和存储器耦合,所述存储器存储有程序指令,当所述存储器存储的程序指令被所述处理器执行时实现权利要求1至11中任一项所述的方法,或者,当所述存储器存储的程序指令被所述处理器执行时实现权利要求12至18中任一项所述的方法。
- 一种计算机可读存储介质,其特征在于,包括程序,当其在计算机上运行时,使得计算机执行如权利要求1至11中任一项所述的方法,或者,使得计算机执行如权利要求12至18中任一项所述的方法。
- 一种电路系统,其特征在于,所述电路系统包括处理电路,所述处理电路配置为执行如权利要求1至11中任一项所述的方法,或者,所述处理电路配置为执行如权利要求12至18中任一项所述的方法。
- 一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行如权利要求1至18中任一项所述的方法。
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP21868262.3A EP4202768A4 (en) | 2020-09-18 | 2021-07-20 | MACHINE LEARNING MODEL TRAINING METHOD AND APPARATUS |
| US18/185,550 US20230237333A1 (en) | 2020-09-18 | 2023-03-17 | Machine learning model training method and related device |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010989062.5 | 2020-09-18 | ||
| CN202010989062.5A CN114282678A (zh) | 2020-09-18 | 2020-09-18 | 一种机器学习模型的训练的方法以及相关设备 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US18/185,550 Continuation US20230237333A1 (en) | 2020-09-18 | 2023-03-17 | Machine learning model training method and related device |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2022057433A1 true WO2022057433A1 (zh) | 2022-03-24 |
Family
ID=80775885
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2021/107391 Ceased WO2022057433A1 (zh) | 2020-09-18 | 2021-07-20 | 一种机器学习模型的训练的方法以及相关设备 |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US20230237333A1 (zh) |
| EP (1) | EP4202768A4 (zh) |
| CN (1) | CN114282678A (zh) |
| WO (1) | WO2022057433A1 (zh) |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114743069A (zh) * | 2022-04-21 | 2022-07-12 | 复旦大学 | 一种对两帧图像进行自适应密集匹配计算的方法 |
| CN114912630A (zh) * | 2022-06-02 | 2022-08-16 | 上海富数科技有限公司广州分公司 | 基于联邦学习的模型训练方法、装置及电子设备 |
| CN116187473A (zh) * | 2023-01-19 | 2023-05-30 | 北京百度网讯科技有限公司 | 联邦学习方法、装置、电子设备和计算机可读存储介质 |
| WO2024031986A1 (zh) * | 2022-08-12 | 2024-02-15 | 华为云计算技术有限公司 | 一种模型管理方法及相关设备 |
| CN120470338A (zh) * | 2025-07-14 | 2025-08-12 | 福建省星云大数据应用服务有限公司 | 一种结合神经网络的药理学相似性检测方法及系统 |
Families Citing this family (15)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11900244B1 (en) * | 2019-09-30 | 2024-02-13 | Amazon Technologies, Inc. | Attention-based deep reinforcement learning for autonomous agents |
| US20220138570A1 (en) * | 2020-11-05 | 2022-05-05 | Mediatek Inc. | Trust-Region Method with Deep Reinforcement Learning in Analog Design Space Exploration |
| US12229280B2 (en) * | 2021-03-16 | 2025-02-18 | Accenture Global Solutions Limited | Privacy preserving cooperative learning in untrusted environments |
| US12554796B2 (en) * | 2021-05-28 | 2026-02-17 | Nvidia Corporation | Optimizing parameter estimation for training neural networks |
| US12367661B1 (en) * | 2021-12-29 | 2025-07-22 | Amazon Technologies, Inc. | Weighted selection of inputs for training machine-trained network |
| US11775516B2 (en) | 2022-02-09 | 2023-10-03 | International Business Machines Corporation | Machine-learning-based, adaptive updating of quantitative data in database system |
| US12394186B2 (en) * | 2022-03-25 | 2025-08-19 | Arizona Board Of Regents On Behalf Of Arizona State University | Systems, methods, and apparatuses for implementing self-supervised domain-adaptive pre-training via a transformer for use with medical image classification |
| US12400434B2 (en) * | 2022-07-08 | 2025-08-26 | Tata Consultancy Services Limited | Method and system for identifying and mitigating bias while training deep learning models |
| CN116092683B (zh) * | 2023-04-12 | 2023-06-23 | 深圳达实旗云健康科技有限公司 | 一种原始数据不出域的跨医疗机构疾病预测方法 |
| CN119067230A (zh) * | 2023-06-02 | 2024-12-03 | 华为技术有限公司 | 一种数据处理方法以及相关设备 |
| CN119886280A (zh) * | 2023-10-25 | 2025-04-25 | 华为云计算技术有限公司 | 一种神经网络训练方法以及相关设备 |
| CN117633536B (zh) * | 2023-12-14 | 2025-04-11 | 中山大学 | 模型训练优化方法、电子设备及计算机可读存储介质 |
| CN117809118B (zh) * | 2024-01-02 | 2025-09-30 | 浪潮卓数大数据产业发展有限公司 | 一种基于深度学习的视觉感知识别方法、设备及介质 |
| CN121626184A (zh) * | 2024-09-06 | 2026-03-10 | 深圳引望智能技术有限公司 | 一种规控信息的获取方法以及相关设备 |
| CN121235021A (zh) * | 2025-12-03 | 2025-12-30 | 山东大学 | 一种基于多退神经网络的自适应训练方法和系统 |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109117953A (zh) * | 2018-09-11 | 2019-01-01 | 北京迈格威科技有限公司 | 网络参数训练方法和系统、服务器、客户端及存储介质 |
| CN110442457A (zh) * | 2019-08-12 | 2019-11-12 | 北京大学深圳研究生院 | 基于联邦学习的模型训练方法、装置及服务器 |
| US20190385043A1 (en) * | 2018-06-19 | 2019-12-19 | Adobe Inc. | Asynchronously training machine learning models across client devices for adaptive intelligence |
| CN110717589A (zh) * | 2019-09-03 | 2020-01-21 | 北京旷视科技有限公司 | 数据处理方法、设备和可读存储介质 |
| CN111477290A (zh) * | 2020-03-05 | 2020-07-31 | 上海交通大学 | 保护用户隐私的联邦学习和图像分类方法、系统及终端 |
Family Cites Families (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20170364799A1 (en) * | 2016-06-15 | 2017-12-21 | Kneron Inc. | Simplifying apparatus and simplifying method for neural network |
| CN109716346A (zh) * | 2016-07-18 | 2019-05-03 | 河谷生物组学有限责任公司 | 分布式机器学习系统、装置和方法 |
| US20190377984A1 (en) * | 2018-06-06 | 2019-12-12 | DataRobot, Inc. | Detecting suitability of machine learning models for datasets |
| US11741355B2 (en) * | 2018-07-27 | 2023-08-29 | International Business Machines Corporation | Training of student neural network with teacher neural networks |
| EP3629246B1 (en) * | 2018-09-27 | 2022-05-18 | Swisscom AG | Systems and methods for neural architecture search |
| US12023192B2 (en) * | 2018-11-29 | 2024-07-02 | The Board Of Trustees Of The Leland Stanford Junior University | Single or a few views computed tomography imaging with deep neural network |
| CN110874484A (zh) * | 2019-10-16 | 2020-03-10 | 众安信息技术服务有限公司 | 基于神经网络和联邦学习的数据处理方法和系统 |
-
2020
- 2020-09-18 CN CN202010989062.5A patent/CN114282678A/zh active Pending
-
2021
- 2021-07-20 EP EP21868262.3A patent/EP4202768A4/en active Pending
- 2021-07-20 WO PCT/CN2021/107391 patent/WO2022057433A1/zh not_active Ceased
-
2023
- 2023-03-17 US US18/185,550 patent/US20230237333A1/en active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20190385043A1 (en) * | 2018-06-19 | 2019-12-19 | Adobe Inc. | Asynchronously training machine learning models across client devices for adaptive intelligence |
| CN109117953A (zh) * | 2018-09-11 | 2019-01-01 | 北京迈格威科技有限公司 | 网络参数训练方法和系统、服务器、客户端及存储介质 |
| CN110442457A (zh) * | 2019-08-12 | 2019-11-12 | 北京大学深圳研究生院 | 基于联邦学习的模型训练方法、装置及服务器 |
| CN110717589A (zh) * | 2019-09-03 | 2020-01-21 | 北京旷视科技有限公司 | 数据处理方法、设备和可读存储介质 |
| CN111477290A (zh) * | 2020-03-05 | 2020-07-31 | 上海交通大学 | 保护用户隐私的联邦学习和图像分类方法、系统及终端 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4202768A4 |
Cited By (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114743069A (zh) * | 2022-04-21 | 2022-07-12 | 复旦大学 | 一种对两帧图像进行自适应密集匹配计算的方法 |
| CN114912630A (zh) * | 2022-06-02 | 2022-08-16 | 上海富数科技有限公司广州分公司 | 基于联邦学习的模型训练方法、装置及电子设备 |
| WO2024031986A1 (zh) * | 2022-08-12 | 2024-02-15 | 华为云计算技术有限公司 | 一种模型管理方法及相关设备 |
| CN116187473A (zh) * | 2023-01-19 | 2023-05-30 | 北京百度网讯科技有限公司 | 联邦学习方法、装置、电子设备和计算机可读存储介质 |
| CN116187473B (zh) * | 2023-01-19 | 2024-02-06 | 北京百度网讯科技有限公司 | 联邦学习方法、装置、电子设备和计算机可读存储介质 |
| CN120470338A (zh) * | 2025-07-14 | 2025-08-12 | 福建省星云大数据应用服务有限公司 | 一种结合神经网络的药理学相似性检测方法及系统 |
Also Published As
| Publication number | Publication date |
|---|---|
| EP4202768A1 (en) | 2023-06-28 |
| CN114282678A (zh) | 2022-04-05 |
| US20230237333A1 (en) | 2023-07-27 |
| EP4202768A4 (en) | 2024-03-13 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2022057433A1 (zh) | 一种机器学习模型的训练的方法以及相关设备 | |
| CN113469340B (zh) | 一种模型处理方法、联邦学习方法及相关设备 | |
| CN111723910B (zh) | 构建多任务学习模型的方法、装置、电子设备及存储介质 | |
| CN114514519B (zh) | 使用异构模型类型和架构的联合学习 | |
| US11295208B2 (en) | Robust gradient weight compression schemes for deep learning applications | |
| CN112862011A (zh) | 基于联邦学习的模型训练方法、装置及联邦学习系统 | |
| CN109117953B (zh) | 网络参数训练方法和系统、服务器、客户端及存储介质 | |
| WO2022012407A1 (zh) | 一种用于神经网络的训练方法以及相关设备 | |
| Le et al. | Brainyedge: An ai-enabled framework for iot edge computing | |
| WO2023065859A1 (zh) | 物品推荐方法、装置及存储介质 | |
| CN112085615B (zh) | 图神经网络的训练方法及装置 | |
| CN115563650A (zh) | 基于联邦学习实现医疗数据的隐私保护系统 | |
| WO2021120677A1 (zh) | 一种仓储模型训练方法、装置、计算机设备及存储介质 | |
| CN112528108B (zh) | 一种模型训练系统、模型训练中梯度聚合的方法及装置 | |
| CN113240127B (zh) | 基于联邦学习的训练方法、装置、电子设备及存储介质 | |
| CN114048328B (zh) | 基于转换假设和消息传递的知识图谱链接预测方法及系统 | |
| WO2021042857A1 (zh) | 图像分割模型的处理方法和处理装置 | |
| CN113536970A (zh) | 一种视频分类模型的训练方法及相关装置 | |
| WO2022052647A1 (zh) | 一种数据处理方法、神经网络的训练方法以及相关设备 | |
| WO2023185541A1 (zh) | 一种模型训练方法及其相关设备 | |
| CN116645130A (zh) | 基于联邦学习与gru结合的汽车订单需求量预测方法 | |
| US12608619B2 (en) | Superseded federated learning | |
| WO2023143080A1 (zh) | 一种数据处理方法以及相关设备 | |
| CN113240128B (zh) | 数据不平衡的协同训练方法、装置、电子设备及存储介质 | |
| CN114004265B (zh) | 一种模型训练方法及节点设备 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21868262 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2021868262 Country of ref document: EP Effective date: 20230321 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |


