WO2022228204A1 - 一种联邦学习方法以及装置 - Google Patents
一种联邦学习方法以及装置 Download PDFInfo
- Publication number
- WO2022228204A1 WO2022228204A1 PCT/CN2022/087647 CN2022087647W WO2022228204A1 WO 2022228204 A1 WO2022228204 A1 WO 2022228204A1 CN 2022087647 W CN2022087647 W CN 2022087647W WO 2022228204 A1 WO2022228204 A1 WO 2022228204A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- server
- client
- federated learning
- servers
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/098—Distributed learning, e.g. federated learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
Definitions
- the present application relates to the field of federated learning, and in particular, to a federated learning method and device.
- Federated learning is an organization method of artificial intelligence (AI) joint modeling. On the basis of not directly obtaining private data of devices, it completes algorithm model tasks by coordinating model training and parameter aggregation across devices, while reducing traditional centralized It has important theoretical innovation and practical implementation value.
- cross-device federated learning also brings new challenges, such as the heterogeneous client devices of federated learning, different devices in storage, computing, communication and battery There are big differences in hardware specifications, etc., which have an impact on the performance of federated learning. Therefore, how to improve the learning efficiency of cross-device federated learning has become an urgent problem to be solved.
- This application provides a federated learning method for synchronizing the aggregated information between each server in the process of federated learning across devices, so as to keep the data of the server synchronized during each round of iterative learning. Access to the server at all times can obtain the full amount of data, improving the overall learning efficiency.
- the present application provides a federated learning method, which is applied to a federated learning system.
- the federated learning system includes multiple servers, the multiple servers are connected to each other, and the multiple servers are used for iteration.
- Federated learning the process of any round of iterative learning includes: first, the first server receives a request message sent by at least one first client, and the request message is used to request the global model stored in the first server.
- the client includes the at least one first client; the first server sends information of the global model and training configuration parameters to the at least one first client, where the information of the global model and the training configuration parameters are used to instruct the at least one first client to use
- the training configuration parameters are used to train the global model;
- the first server receives first model update parameters fed back by at least one first client respectively, and the first model update parameters are parameters of the global model obtained after the at least one first client is trained;
- the first server aggregates the first model update parameters fed back by at least one first client to obtain the first aggregated information in the current iteration;
- the first server obtains the second aggregated information sent by the second server,
- the second aggregation information is the information obtained by the second server in the current iteration by aggregating the received second model update parameters;
- the first server updates the first server based on the first aggregation information and the second aggregation information on the stored global model to get the updated global model.
- the client needs to actively send a request message to the server to request participation in federated learning, and send the model update parameters obtained after training the global model to the server. Therefore, the client accesses the server by sending a request message to the server, and the client does not need to maintain a long-term stable connection with the server to implement a cross-device federated learning system.
- the client In addition to aggregating the model update parameters received by multiple servers, they also receive and aggregate model update parameters sent by other servers, so that each server has more data than its own aggregated information.
- the terminal When the terminal is connected to the server, more data can be obtained, so that each round of iterative training can be performed based on more data, and the overall efficiency of federated learning can be improved.
- the client accesses the server through a combination of requests and responses, and there is no need to maintain a long-term stable connection between the client and the server, and the client can access any server at any time. More data is needed to improve the learning efficiency of client-side federated learning.
- the first server may receive request messages sent by multiple clients to send request messages to the server.
- the first server can deliver the locally stored global model information and training configuration parameters to the multiple clients. Therefore, in the embodiment of the present application, the client can access the server by combining requests and responses, and the server does not need to actively address the client, which is equivalent to that the client can only access the server when there is a need, so as to achieve Federated learning across devices.
- the first server may screen the multiple clients, and select at least one client to participate in federated learning, For example, whether to allow the client to participate in federated learning is determined according to the client's connection status or device status, so as to screen out the stable clients for federated learning and improve the overall efficiency of federated learning.
- the number of second servers included in the foregoing multiple servers is multiple, and the first server obtains the second aggregate information sent by the second server, which may include: the first server Receive second aggregation information sent by multiple second servers respectively; the aforementioned first server updates the global model stored on the first server based on the first aggregation information and the second aggregation information, which may include: the first server The global model stored on the first server is updated based on the first aggregation information and the second aggregation information respectively sent by the plurality of second servers to obtain an updated global model.
- the servers in the federated learning system can transmit data to each other, so that each server has the full amount of data, so that the client can obtain the full amount of data when it accesses the server at any time.
- the accuracy of the data obtained by the client is improved, and the client obtains the full amount of data during the federated learning process, which can improve the overall efficiency of the federated learning.
- the above-mentioned method may further include: transmitting the first aggregated information between the aforementioned multiple servers and sending the first aggregate information to each second server, so that each second server can The first aggregated information and the second aggregated information are aggregated to obtain third aggregated information.
- the servers in the federated learning system can transmit aggregated information to each other, so that each server has more data, thereby maintaining data consistency in the federated learning system.
- one of the multiple servers is used as the master node.
- the master node is used to manage the multiple servers, and the first server is used as the master node.
- Send a first trigger indication to the second server where the first trigger indication is used to instruct the second server to send the second aggregation information to the first server, so that the first server receives the second aggregation information sent by the second server.
- the federated learning process in the federated learning system can be managed by the master node, and data is synchronized between each server under the trigger of the master node, thereby realizing data consistency in the federated learning system.
- the master node includes a counter, and the counter is used to count the clients that send the request message, and the above-mentioned first server obtains the second aggregate in the second server under the trigger of the master node.
- the information may include: when the value of the counter meets the first threshold, the first server sends a second trigger indication to the second server, where the second trigger indication is used to instruct the second server to perform the next round of iteration.
- the received request messages can be counted by a counter, and when the count reaches a certain value, data synchronization between each server can be triggered, which is equivalent to limiting the customers participating in each round of federated learning iteration. To avoid the long tail effect caused by excessive client participation, each server can synchronize data in a timely manner.
- the first threshold includes a preset value, or, the first threshold is a value determined by the number of clients accessing all servers in the federated learning system in the previous iteration process. Therefore, in the embodiment of the present application, the first threshold corresponding to the counter may be a preset value, or a value determined according to the number of clients accessing all servers in the federated learning system. Usually, the first threshold is not greater than the access The number of clients in the federated learning system, so as to avoid the long waiting time for federated learning due to waiting for client access, and improve the overall efficiency of federated learning.
- the master node is further provided with a timer
- the above method may further include: when the timer exceeds the second threshold, that is, when the timer times out, the first server sends the first server to each server.
- Two trigger indications where the second trigger indication is used to trigger each server to enter the next round of iteration. Therefore, in the embodiment of the present application, the time window of each iteration can be set by a timer, and when the timer times out, the next round of iterative learning can be entered, so as to avoid the long tail effect caused by the long waiting time, even if If the client quits during federated learning, it does not affect the overall training efficiency of federated learning.
- the second threshold is a preset value, a value determined according to the number of clients accessing the federated learning system, or an amount of data communicated between all servers and clients in the federated learning system. A sized value.
- the time window can be determined according to the number of clients or the traffic volume, so that the time window corresponding to the timer matches the actual scene, and the time window of each iteration is set more reasonably.
- the above method may further include: the first server receives a query message sent by a third client, where the third client is any one of multiple clients; the first server responds to the query message Send the updated global model information to the third client.
- the client can send a query message to the server to query the latest global model. Due to the data synchronization between each server during each round of iterative learning, each server has more data than its own aggregated information, and the client can obtain more data no matter which server it accesses. data, so that the client can obtain a more accurate global model.
- the present application provides a federated learning method, which is applied to a federated learning system.
- the federated learning system includes multiple servers, one of the multiple servers is used as the master node, and the multiple servers are used for iteration.
- Learning to achieve federated learning; the process of any round of iterations in iterative learning includes:
- the master node After any one of the multiple servers receives the first request message, the master node starts the counter and the timer.
- the counter is used to count the request messages received by the multiple servers in one iteration.
- the request message It is used to request to obtain the global model stored in the corresponding server among the multiple servers; if the value of the counter reaches the first threshold, the master node sends a first trigger indication to each of the multiple servers, and the first trigger
- the instruction is used to instruct multiple servers to transmit locally stored information to each other; if the value of the counter does not reach the first threshold and the value of the timer reaches the second threshold, the master node sends a second trigger instruction to each server, The second trigger indication is used to instruct each server to perform the next round of iteration.
- one of the servers in the federated learning system is used as the master node, and the master node maintains the timer and the counter.
- the counter counts the received request messages, and at the same time Use a timer for timing, so as to limit the number of clients participating in each iteration and the training duration of each iteration, so as to avoid meaningless waiting caused by clients falling behind, improve the overall efficiency of federated learning, and avoid long tails effect.
- the first threshold is a preset value, or the first threshold is related to the number of clients accessing the federated learning system during the previous iteration. Therefore, in the embodiment of the present application, the first threshold corresponding to the counter may be a preset value, or a value determined according to the number of clients accessing all servers in the federated learning system. Usually, the first threshold is not greater than the access The number of clients in the federated learning system, so as to avoid the long waiting time for federated learning due to waiting for client access, and improve the overall efficiency of federated learning.
- the second threshold is a preset value, or the second threshold is related to the number of clients accessing the federated learning system in the previous iteration, or the second threshold is related to the federated learning in the previous iteration
- the amount of data communicated between each server in the system and the corresponding client is related to the amount of data. Therefore, in the embodiment of the present application, the time window of each iteration can be set by a timer, and when the timer times out, the next round of iterative learning can be entered, so as to avoid the long tail effect caused by the long waiting time, even if If the client quits during federated learning, it does not affect the overall training efficiency of federated learning.
- the present application provides a federated learning system, including multiple servers and multiple clients, the multiple servers include a first server and a second server, and both the first server and the second server are The information of the global model is stored, and multiple servers are used for iterative learning to realize federated learning. During any round of iteration in the iterative learning:
- the first server is configured to receive the request messages respectively sent by at least one first client;
- the first server is configured to send the information of the global model and the training configuration parameters to the at least one first client according to the request messages respectively sent by the at least one first client, and the information of the global model and the training configuration parameters are used to indicate the at least one first client.
- the client uses the training configuration parameters to train the global model;
- the first server is configured to receive first model update parameters fed back by at least one first client respectively, where the first model update parameters are parameters of the global model obtained after the at least one first client is trained;
- the first server is configured to aggregate first model update parameters fed back by at least one first client respectively, to obtain first aggregated information
- the second server is configured to receive the second model update parameters sent by the corresponding at least one second client, and aggregate the second model update parameters sent by the at least one second client to obtain second aggregation information;
- the first server is configured to receive the second aggregation information sent by each second server;
- the first server is configured to update the global model stored on the first server based on the first aggregation information and the second aggregation information sent by each second server to obtain an updated global model.
- the client can actively send a request to the server to request to participate in the federated learning, and feed back the parameters of the trained global model to the server, and the server can
- the received model update parameters are aggregated to obtain aggregated information, and the server also receives aggregated information sent by other servers, so that the server has more information than its own aggregated information. So that the client and the server do not need to maintain a long connection, the client can access the server at any time to obtain more data, which can improve the accuracy of the data obtained by the client, so that it can be based on more data.
- Each round of iterative training improves the overall efficiency of federated learning.
- the number of second servers included in the multiple servers is multiple; each of the multiple second servers is configured to receive at least one corresponding second client The sent second model update parameters, and aggregate the second model update parameters sent by at least one second client to obtain second aggregated information; the first server is specifically configured to receive the data sent by multiple second servers second aggregation information; the first server is specifically configured to update the global model stored on the first server based on the first aggregation information and the second aggregation information respectively sent by multiple second servers to obtain an updated global model .
- the servers in the federated learning system can transmit data to each other, so that each server has the full amount of data, so that the client can obtain the full amount of data when it accesses the server at any time.
- the accuracy of the data obtained by the client is improved, and the client obtains the full amount of data during the federated learning process, which can improve the overall efficiency of the federated learning.
- the multiple servers further include a third server serving as a master node, and the master node is used to manage the multiple servers; the master node is used to send the first server to the multiple servers respectively. triggering indication; the second server is specifically configured to send the second aggregation information to the first server based on the first triggering indication.
- the master node can trigger the mutual transmission of aggregated information between the servers, so that each server can have more comprehensive full data, so that the client can access the server at any time. Both can obtain the full amount of data, improve the accuracy of the data obtained by the client, and in the process of federated learning, the client obtains the full amount of data, which can improve the overall efficiency of federated learning.
- the master node includes a counter, and the counter is used to count the number of request messages received by multiple servers, and the request messages are used to request to obtain the storage in the corresponding server among the multiple servers.
- the global model; the master node is specifically used to send a first trigger instruction to each of the multiple servers when the value of the counter meets the first threshold, and the first trigger instruction is used to trigger each second server to send the first trigger instruction to the first A server sends the second aggregated information.
- the received request messages can be counted by a counter, and when the count reaches a certain value, data synchronization between each server can be triggered, which is equivalent to limiting the customers participating in each round of federated learning iteration. To avoid the long tail effect caused by excessive client participation, each server can synchronize data in a timely manner.
- the master node further includes a timer, and the timer starts timing from the first request message received in each round of iteration; the master node is further configured to, when the timer exceeds the second threshold, A second trigger indication is respectively sent to each of the multiple servers, where the second trigger indication is used to instruct the multiple servers to perform the next round of iteration.
- the time window of each iteration can be set by a timer, and when the timer times out, the next round of iterative learning can be entered, so as to avoid the long tail effect caused by the long waiting time, even if If the client quits during federated learning, it does not affect the overall training efficiency of federated learning.
- the second threshold is a preset value, or the second threshold is related to the number of clients accessing each server in the federated learning system during the previous iteration, or the second threshold is related to The amount of data communicated between multiple servers and multiple clients in the last iteration is related to the size of the data. Therefore, in the embodiment of the present application, the time window can be determined according to the number of clients or the traffic volume, so that the time window corresponding to the timer matches the actual scene, and the time window of each iteration is set more reasonably.
- the first server receives a query message sent by a third client, and the third client includes any client that accesses the federated learning system; the first server sends the query message to the third client
- the client sends the updated global model information. Therefore, in the embodiment of the present application, the client can send a query message to the server to query the latest global model. Due to the data synchronization between each server during each round of iterative learning, each server has more data than its own aggregated information, and the client can obtain more data no matter which server it accesses. data, so that the client can obtain a more accurate global model.
- the first server is further configured to send the first aggregation information to the second server; the second server is specifically configured to update the locally stored global information in combination with the first aggregation information and the second aggregation information model to get the updated global model. Therefore, in the embodiment of the present application, the servers in the federated learning system can transmit aggregated information to each other, so that each server has more data, thereby maintaining data consistency in the federated learning system.
- the present application provides a server, which is applied to a federated learning system.
- the federated learning system includes multiple servers and multiple clients.
- the multiple servers are used for iterative federated learning.
- Any server includes:
- a transceiver module configured to receive a request message sent by at least one first client, where the request message is used to request the global model stored in the first server, and the multiple clients include at least one first client;
- the transceiver module is further configured to send the information of the global model and the training configuration parameters to the at least one first client, where the information of the global model and the training configuration parameters are used to instruct the at least one first client to use the training configuration parameters to train the global model;
- the transceiver module is further configured to receive first model update parameters fed back by at least one first client respectively, where the first model update parameters are parameters of the global model obtained after the at least one first client is trained;
- an aggregation module configured to aggregate the first model update parameters fed back by at least one first client to obtain the first aggregation information in this round of iteration;
- the transceiver module is further configured to obtain second aggregation information sent by the second server, where the second aggregation information is information obtained by the second server in the current iteration by aggregating the received second model update parameters;
- the updating module is configured to update the global model stored on the first server based on the first aggregated information and the second aggregated information to obtain an updated global model.
- the transceiver module is also used to receive the second aggregation information sent by a plurality of second servers respectively;
- the updating module is specifically configured to update the global model stored on the first server based on the first aggregation information and the second aggregation information respectively sent by the plurality of second servers, so as to obtain an updated global model.
- the first server is a master node in the federated learning system, and the master node is used to manage multiple servers,
- the transceiver module is further configured to send a first trigger instruction to the second server, where the first trigger instruction is used to instruct the second server to send the second aggregation information to the first server;
- the transceiver module is further configured to receive the second aggregation information in the second server.
- a counter is set in the first server, and the counter is used to count the number of request messages received by multiple servers,
- the transceiver module is specifically configured to send the first trigger indication to the second server when the value of the counter meets the first threshold.
- the first threshold is a preset value, or the first threshold is related to the number of clients accessing the federated learning system during the previous iteration.
- the master node includes a timer, and the timer starts timing after receiving the first request message in each round of iteration,
- the transceiver module is specifically configured to receive a second trigger instruction sent by the master node when the timer exceeds the second threshold, where the second trigger instruction is used to instruct the next round of iterative learning.
- the second threshold is a preset value, or the second threshold is related to the number of clients accessing the federated learning system in the previous iteration, or the second threshold is related to the previous iteration
- the amount of data communicated between each server and the corresponding client in the federated learning system is related to the amount of data.
- the transceiver module is further configured to receive a query message sent by a third client, where the third client includes any one of the clients accessing the federated learning system;
- the transceiver module is further configured to send the updated global model information to the third client for the query message.
- the transceiver module is further configured to send the first aggregated information to the second server, so that the second server updates the locally stored global model based on the first aggregated information and the second aggregated information to Get the updated global model.
- the present application provides a server, which is applied to a federated learning system.
- the federated learning system includes multiple servers, one of the multiple servers serves as a master node, and the multiple servers are used for iterative learning.
- the master node includes:
- the startup module is used to start the counter and timer after any one of the multiple servers receives the first request message, and the counter is used for the request messages received by the multiple servers in one iteration. Counting, the request message is used to request to obtain the global model stored in the corresponding server in multiple servers;
- a transceiver module used for if the value of the counter reaches the first threshold, the master node sends a first trigger instruction to each of the multiple servers, and the first trigger instruction is used to instruct the multiple servers to transmit local storage to each other Information;
- the transceiver module is further configured to, if the value of the counter does not reach the first threshold and the value of the timer reaches the second threshold, the master node sends a second trigger indication to each server, and the second trigger indication is used to instruct each server proceed to the next iteration.
- the first threshold is a preset value, or the first threshold is related to the number of clients accessing the federated learning system during the previous iteration.
- the second threshold is a preset value, or the second threshold is related to the number of clients accessing the federated learning system in the previous iteration, or the second threshold is related to the federated learning in the previous iteration
- the amount of data communicated between each server in the system and the corresponding client is related to the amount of data.
- an embodiment of the present application provides a federated learning device, including: a processor and a memory, wherein the processor and the memory are interconnected through a line, and the processor invokes program codes in the memory to execute the first aspect or the second
- the processing-related functions shown in any of the aspects are used in federated learning methods.
- the federated learning device may be a chip.
- an embodiment of the present application provides a federated learning device.
- the federated learning device may also be referred to as a digital processing chip or a chip.
- the chip includes a processing unit and a communication interface.
- the processing unit obtains program instructions through the communication interface, and the program instructions are
- the processing unit executes, and the processing unit is configured to perform processing-related functions as in any optional implementation manner of the first aspect or the second aspect.
- an embodiment of the present application provides a computer-readable storage medium, including instructions, which, when executed on a computer, cause the computer to execute the method in any optional implementation manner of the first aspect or the second aspect. .
- an embodiment of the present application provides a computer program product including instructions, which, when run on a computer, enables the computer to execute the method in any optional implementation manner of the first aspect or the second aspect.
- Fig. 1 is a schematic diagram of a main frame of artificial intelligence applied by the application
- FIG. 2 is a schematic diagram of the architecture of a federated learning system provided by an embodiment of the present application
- FIG. 3 is a schematic structural diagram of a server according to an embodiment of the present application.
- FIG. 4 is a schematic flowchart of a federated learning method provided by an embodiment of the present application.
- FIG. 5 is a schematic diagram of the architecture of another federated learning system provided by an embodiment of the present application.
- FIG. 6 is a schematic flowchart of another federated learning method provided by an embodiment of the present application.
- FIG. 7 is a schematic flowchart of another federated learning method provided by an embodiment of the present application.
- FIG. 8 is a schematic diagram of a polymerization mode provided in an embodiment of the present application.
- FIG. 9 is a schematic structural diagram of a server according to an embodiment of the present application.
- FIG. 10 is a schematic structural diagram of another server provided by an embodiment of the present application.
- FIG. 11 is a schematic structural diagram of another server provided by an embodiment of the present application.
- Figure 1 shows a schematic structural diagram of the main frame of artificial intelligence.
- the above-mentioned artificial intelligence theme framework is explained in two dimensions (vertical axis).
- the "intelligent information chain” reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, data has gone through the process of "data-information-knowledge-wisdom".
- the "IT value chain” reflects the value brought by artificial intelligence to the information technology industry from the underlying infrastructure of human intelligence, information (providing and processing technology implementation) to the industrial ecological process of the system.
- the infrastructure provides computing power support for artificial intelligence systems, realizes communication with the outside world, and supports through the basic platform. Communicate with the outside through sensors; computing power is provided by intelligent chips, such as central processing unit (CPU), network processor (neural-network processing unit, NPU), graphics processor (English: graphics processing unit, GPU), Application specific integrated circuit (ASIC) or field programmable gate array (field programmable gate array, FPGA) and other hardware acceleration chips) are provided; the basic platform includes distributed computing framework and network related platform guarantee and support, It can include cloud storage and computing, interconnection networks, etc. For example, sensors communicate with external parties to obtain data, and these data are provided to the intelligent chips in the distributed computing system provided by the basic platform for calculation.
- intelligent chips such as central processing unit (CPU), network processor (neural-network processing unit, NPU), graphics processor (English: graphics processing unit, GPU), Application specific integrated circuit (ASIC) or field programmable gate array (field programmable gate array, FPGA) and other hardware acceleration chips
- CPU central processing unit
- the data on the upper layer of the infrastructure is used to represent the data sources in the field of artificial intelligence.
- the data involves graphics, images, voice, and text, as well as IoT data from traditional devices, including business data from existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
- Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making, etc.
- machine learning and deep learning can perform symbolic and formalized intelligent information modeling, extraction, preprocessing, training, etc. on data.
- Reasoning refers to the process of simulating human's intelligent reasoning method in a computer or intelligent system, using formalized information to carry out machine thinking and solving problems according to the reasoning control strategy, and the typical function is search and matching.
- Decision-making refers to the process of making decisions after intelligent information is reasoned, usually providing functions such as classification, sorting, and prediction.
- some general capabilities can be formed based on the results of data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, image identification, etc.
- Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of the overall solution of artificial intelligence, and the productization of intelligent information decision-making to achieve landing applications. Its application areas mainly include: intelligent terminals, intelligent transportation, Smart healthcare, autonomous driving, safe city, etc.
- the embodiments of the present application can be applied to the field of federated learning, and the neural network can be trained collaboratively through the client and the server, thus involving a large number of related applications of neural networks, such as the neural network trained by the client during federated learning.
- relevant terms and concepts of neural networks that may be involved in the embodiments of the present application are first introduced below.
- a neural network can be composed of neural units, and a neural unit can refer to an operation unit that takes xs and intercept 1 as inputs, and the output of the operation unit can be shown in formula (1-1):
- s 1, 2,...n, n is a natural number greater than 1
- Ws is the weight of xs
- b is the bias of the neural unit.
- f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal.
- the output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function.
- a neural network is a network formed by connecting a plurality of the above single neural units together, that is, the output of one neural unit can be the input of another neural unit.
- the input of each neural unit can be connected with the local receptive field of the previous layer to extract the features of the local receptive field, and the local receptive field can be an area composed of several neural units.
- a deep neural network also known as a multi-layer neural network, can be understood as a neural network with multiple intermediate layers.
- the DNN is divided according to the position of different layers.
- the neural network inside the DNN can be divided into three categories: input layer, intermediate layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the middle layers are all intermediate layers, or hidden layers.
- the layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.
- each layer can be expressed as a linear relational expression: in, is the input vector, is the output vector, is the offset vector or the bias parameter, w is the weight matrix (also called the coefficient), and ⁇ () is the activation function.
- in is the input vector
- the output vector is the offset vector or the bias parameter
- w is the weight matrix (also called the coefficient)
- ⁇ () is the activation function.
- Each layer is just an input vector After such a simple operation to get the output vector Due to the large number of DNN layers, the coefficient W and offset vector The number is also higher.
- the DNN Take the coefficient w as an example: Suppose that in a three-layer DNN, the linear coefficient from the 4th neuron in the second layer to the 2nd neuron in the third layer is defined as The superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third layer index 2 and the input second layer index 4.
- the coefficient from the kth neuron in the L-1 layer to the jth neuron in the Lth layer is defined as
- the input layer does not have a W parameter.
- more intermediate layers allow the network to better capture the complexities of the real world.
- a model with more parameters is more complex and has a larger "capacity", which means that it can complete more complex learning tasks.
- Training the deep neural network is the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (the weight matrix formed by the vectors W of many layers).
- Convolutional neural network is a deep neural network with a convolutional structure.
- the convolutional neural network contains a feature extractor composed of convolutional layers and subsampling layers, which can be regarded as a filter.
- the convolutional layer refers to the neuron layer in the convolutional neural network that convolves the input signal.
- a convolutional layer of a convolutional neural network a neuron can only be connected to some of its neighbors.
- a convolutional layer usually contains several feature planes, and each feature plane can be composed of some neural units arranged in a rectangle. Neural units in the same feature plane share weights, and the shared weights here are convolution kernels. Shared weights can be understood as the way to extract image information is independent of location.
- the convolution kernel can be initialized in the form of a matrix of random size, and the convolution kernel can obtain reasonable weights by learning during the training process of the convolutional neural network.
- the immediate benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
- the network for extracting features mentioned below in this application may include one or more layers of convolutional layers.
- the network for extracting features may be implemented by using CNN.
- the loss function can generally include loss functions such as mean square error, cross entropy, logarithm, and exponential.
- mean squared error can be used as a loss function, defined as Specifically, a specific loss function can be selected according to the actual application scenario.
- the neural network can use the error back propagation (BP) algorithm to correct the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller.
- BP error back propagation
- the input signal is passed forward until the output will generate error loss, and the parameters in the initial neural network model are updated by back-propagating the error loss information, so that the error loss converges.
- the back-propagation algorithm is a back-propagation movement dominated by error loss, aiming to obtain the parameters of the optimal neural network model, such as the weight matrix.
- the global model when the client performs model training, can be trained by using the loss function or the BP algorithm, and the trained global model can be obtained.
- a distributed machine learning algorithm through multiple clients (clients or FL-Clients), such as mobile devices or edge servers, and the server cooperate to complete model training and algorithm update on the premise that the data is not out of the domain, and get trained
- clients clients or FL-Clients
- the server cooperate to complete model training and algorithm update on the premise that the data is not out of the domain, and get trained
- the latter global model For example, the client uses the locally stored training samples to train the global model and feeds back the model parameters obtained after training to the server.
- the server aggregates the received model parameters and updates the locally stored global model based on the aggregated information. model to get the latest global model.
- the dataset is divided according to the user dimension, and the part of the data with the same user characteristics but different users is removed for training.
- the client trains the model, then uploads it to the server, and the server aggregates it to obtain the final model.
- the commonly used distributed training is the Parameter Server (PS) mode, which includes the roles of Worker (the computing node for training in the parameter server training mode) and PS (parameter server).
- the training data is distributed to each Worker by the computing center.
- Worker and PS communicate with PS through the network, exchange information such as models or gradients, and the PS performs aggregation and model optimization, so as to efficiently train to obtain the final model.
- PS Parameter Server
- each server is actually only responsible for part of the assigned parameters (multiple servers jointly maintain a global shared parameter), and each computing node is only assigned part of the data and processing tasks.
- each server can only handle the calculation of some parameters, and requires the participants to maintain a stable and long connection with the central server.
- Most FL-Clients are stateless, unreliable, and unaddressable, and the central server needs to aggregate all parameters and complete the aggregation calculation.
- the PS mode training mode obviously cannot handle the challenges of cross-device terminals.
- TFF TensorFlow Federated
- PaddleFL provides a data-parallel-based distributed federated learning framework, which includes three roles: FL-Server, FL-Trainer, and FL-Scheduler, but FL-Client and FL-Server use the remote procedure call system (remote procedure call). calls, RPC) for cross-process communication, there is a stable and long connection between the two, and FL-Scheduler can schedule Workers to decide which Workers can participate in training before each update cycle. This is not possible in cross-device deployment scenarios.
- RPC remote procedure call
- this application provides a federated learning architecture, which enables clients to access servers through a looser access method, and maintains data consistency between servers, so that clients can obtain full data when they access the server at any time. , to improve the model training efficiency of the client.
- the federated learning system provided by this application may be as shown in FIG. 2 , the system (or also referred to as a cluster) may include multiple servers, the multiple servers establish connections with each other, and each server may communicate with a Or multiple client communication.
- the server can usually be deployed in the server, and in some scenarios, the server can also be deployed in the terminal device, which can be adjusted according to the actual application scenario, which is not limited in this application.
- the client can be deployed in terminals, such as mobile phones, wristbands, smart TVs, etc., or can be deployed in servers, which can be adjusted according to actual application scenarios, which is not limited in this application.
- One of the multiple servers can be used as a master node, and the master node can maintain transaction consistency in the cluster, so that each server can work together.
- the framework may also include a scheduler (not shown in the figure) or an elastic load balance (ELB) (not shown in the figure), etc., which will not be repeated here.
- the scheduler can be deployed on any server, and can be used to schedule communication resources in the cluster, such as time windows and channels used for transmission efficiency.
- ELB can be deployed on any server, and can be used to route the communication between the server and the client. The ELB determines the server that the client accesses according to the load of each server. For example, the ELB can send a request message The client is connected to the server with less current load, so that each server can maintain a suitable load and avoid the situation that one or more servers are overloaded.
- each server can deliver the model to be trained to the client that has established a connection with it, and the client can use the locally stored training samples to train the model, and use the trained model to The data is fed back to the server.
- the server can aggregate the received data of the one or more models to obtain the aggregated data. Data, equivalent to the aggregated model.
- the final model can be output to complete federated learning.
- the structure of the server in the federated learning system provided by this application can refer to FIG. 3, and the server may include a training engine, a storage module, a time-limited communication module, a device management module, an encryption module, a protocol parsing module or a communication module. Wait.
- the communication module is used for data communication between FL-Server and FL-Client, as well as dynamic networking of FL-Server cluster to achieve elastic scaling of central server resources.
- the hypertext transfer protocol (HTTP) protocol can be used for transmission to achieve a stable short-term connection.
- HTTP hypertext transfer protocol
- cross-process communication can be performed through the remote procedure call system (remote procedure calls, RPC) protocol to achieve stable long-term connection.
- Device management module responsible for device-related services such as federated learning device selection, for example, after receiving the client's request message, screening whether the client participates in federated learning.
- Encryption and decryption module Encrypt the uploaded and downloaded model weights to reduce the possibility of obtaining original samples through model reverse attack.
- Protocol parsing module serialize the sent communication message or deserialize the received message, etc.
- Training Engine A module that performs federated learning calculations on data uploaded by FL-Client, including aggregators and optimizers.
- the aggregator can be used to aggregate data, such as summing and weighted summing of data uploaded by FL-Client.
- the optimizer can be used to calculate the gradient of the model, and optimize the model based on the calculated gradient to obtain a model with a more accurate output.
- Storage module stores metadata required for federated learning and data such as the globally optimal model obtained from training.
- Time-limited communication device includes a timer and a counter, and performs subsequent processing when the timer times out or the counter value reaches a threshold.
- FIG. 4 the following is a schematic flowchart of the federated learning method provided by this application in combination with the aforementioned architecture.
- the first server receives at least one request message sent by at least one client.
- the first server can be any one of the multiple first servers shown in FIG. 2, and the at least one client can be a client that has established a connection with the first server, such as the at least one
- the request message sent by the client can be routed to the first server by the ELB.
- the client can request the first server to participate in federated learning through the request message, so as to request to use local data for model training, and to influence the global model through the local data, so as to obtain a model that carries the personalized characteristics of the client.
- a request message can be sent to the federated learning system, and the ELB in the system will route the request message to the adapted first server to pass the request message.
- the first server and the client cooperate to implement federated learning.
- the client can collect personalized training data that is different from other devices. After collecting the training data, the client can actively send a request message to the federated learning system to request the use of locally stored data for federated learning. Thereby, a model adapted to the training data is obtained, and the model is synchronized to the system to complete federated learning, so that the global model in the cluster can adaptively learn various data collected by each client, and improve the performance of the global model. output precision.
- the first server sends the global model and training configuration parameters to at least one client respectively according to the request message.
- the first server After the first server receives the request message sent by the at least one client, if it is confirmed that the at least one client is allowed to participate in federated learning, it can send the information of the locally stored global model and the training configuration parameters to the at least one client respectively.
- the global model is the model to be trained
- the information of the global model may include structural parameters of the global model (such as the depth or width of the global model, etc.) or weight parameters, etc.
- the training configuration parameters are used when training the global model. parameters, such as the learning rate, the number of epochs, or the category in the security algorithm.
- the first server may screen the multiple clients, and select at least one Participate in federated learning.
- the request message can carry information of the terminal, such as network connection status, power, load and other information.
- the first server can filter out the clients with stable connection according to the terminal information to perform federated learning, and send the selected at least one
- the client issues the model and training configuration parameters, so that the client can train the model based on the training configuration parameters.
- the client uses the training configuration parameters to train the global model.
- the client After the client (or the first client) receives the global model and the training configuration parameters delivered by the first server, it can use the training set to train the global model to obtain the trained global model.
- the training set may include multiple training samples, and the training samples may be data collected by the client or data received by the client, which is not limited here.
- the client may be an application program deployed on the user's terminal.
- the client may collect data generated during the user's enabling of the client, and use the data as a training sample to request participation from the server.
- Federated learning so that the collected data is applied to the model update, so that the global model in the federated learning system can be adapted to each client.
- the client feeds back model update parameters to the first server.
- the information of the trained model that is, the model update parameters, can be fed back to the first server.
- the model update parameters may include weight parameters of the model.
- the model update parameters received by server j can be expressed as: n k represents the dataset size of the kth client, That is, the weight parameters of the model obtained after client training.
- the first server aggregates the model update parameters fed back by at least one client to obtain first aggregated information.
- the first server can aggregate the model update parameters fed back by the at least one client to obtain first aggregated information. It is equivalent to aggregating the model fed back by the at least one client to obtain the information of the aggregated model.
- the specific manner may include various, such as averaging, weighted fusion, etc.
- the present application does not limit the aggregation manner, which may be adjusted according to actual application scenarios.
- the first server obtains the second aggregation information in the second server.
- the second server includes servers other than the first server in the federated learning system, and the number of the second server may be one or more.
- the first server can also receive second aggregation information obtained by other servers after aggregating the received model update parameters.
- the aggregated information in each server can maintain consistency.
- each server can obtain the aggregated information of other servers, so that each server can maintain more data, and each client can obtain more comprehensive data when accessing the server. There is no need to maintain a long connection state with the server. Therefore, the federated learning system provided in this application can adapt to the scenario where the client is deployed on the mobile terminal.
- the first server can receive aggregated information sent by all servers in the federated learning system except itself, that is, each server in the federated learning system can transmit aggregated information to each other, so that each The server can have full data.
- each server in the federated learning system can transmit aggregated information to each other, so that each The server can have full data.
- the client accesses the server at any time, it can obtain the full amount of data, thereby improving the accuracy of the data obtained by the client.
- all servers in the federated learning system transmit aggregated information to each other as an example for illustrative description.
- data may also be transmitted between some servers, or Some servers only receive aggregated information without sending aggregated information, etc., which can be adjusted according to actual application scenarios, which is not limited in this application.
- the manner in which the second server or the second aggregated information is obtained is similar to the manner in which the first server acquires the first aggregated information.
- the global model and training configuration parameters are delivered to the second client, where the global model and training configuration parameters are generally the same as the global model and training configuration parameters delivered by the first server to the first client.
- the second client uses the locally stored training data to train the global model based on the training configuration parameters, and feeds back the parameters of the trained global model (such as weight parameters, model width or depth, etc.) to the second server.
- the second server aggregates the received model update parameters to obtain the second aggregated information.
- the specific process of acquiring the second aggregation information by the second server will not be repeated.
- the master node can trigger the mutual transmission of aggregated information between various services in the cluster to achieve data synchronization in the cluster, so that all servers in the cluster can have more comprehensive data.
- the master node may be one of the servers in the federated learning system, such as the aforementioned first server or second server.
- At least one of a counter or a timer may be set in the master node to monitor the number of clients participating in each round of iterations or a time window, so as to effectively control the number of clients participating in federated learning.
- the number of clients, and the duration of each iteration process, avoid ineffective waiting and reduce the effect of "falling tail".
- the counter is used to count the request messages sent by the client received in the federated learning system, and after each round of iteration is completed, the master node can initialize the counter.
- the master node can start the timer, and count each time a request message is received.
- the first server can be triggered to send the model and training configuration parameters to the client.
- the master node sends a first trigger instruction to each server to instruct each server to aggregate its own information. It is transmitted to other servers, such as instructing the second server to send the second aggregation information to the first server.
- the master node can trigger itself to send the first aggregation information to other servers (such as all second servers), and also send a first trigger indication to the second servers to indicate each The second server sends its own second aggregation information to other servers.
- ELB can send a notification message to the master node to notify the master node of the received master node, such as the number of request messages, reception timestamp, etc., so that the master node can Counts the number of request messages.
- the master node sends a first trigger instruction to each server, which triggers each server (including the master node itself) in the federated learning system to transmit data to each other, thereby completing data synchronization between servers , so that each server has more comprehensive data.
- the client accesses the server, it can obtain more comprehensive data, so that federated learning or applications can be performed based on the more comprehensive data, so as to improve the user experience of the client. user experience.
- the first threshold includes a preset value, or is related to the number of clients accessing multiple servers in the current iteration or the previous iteration, and the like. Therefore, in the embodiment of the present application, the first threshold corresponding to the counter may be a preset value, or a value determined according to the number of clients accessing the server, and usually the first threshold is not greater than the number of clients accessing the server , so as to avoid the long waiting time for federated learning due to waiting for client access, and improve the overall efficiency of federated learning.
- the timer can be used to set a time window in the process of federated learning, that is, only one round of federated learning is performed within the time window.
- the master node can trigger the server to perform the next round of learning , if the model update parameter is obtained during the current learning process, the model update parameter can be discarded, and of course, the model update parameter can be retained and specified to improve the utilization of data. Therefore, in the embodiment of the present application, a timer is used to monitor the time window of each round of iterative learning, and each round of iterative learning can be performed within the time window of the timer, so that some clients may be disconnected or the training efficiency may be reduced. When it is too low, the influence of these clients on the federated learning process is reduced, and the efficiency of federated learning is improved.
- the master node can start the timer and complete the current iteration within the time window of the timer.
- the master node can send a second trigger instruction to each server in the federated learning system (including the master node itself), triggering each server to enter the next round of iteration.
- the master node determines that the timer has expired, it can directly determine to enter the next round of iteration, and it does not need to send a second trigger instruction for itself. Of course, it can also generate a second trigger instruction for itself, which is not made in this application. limited.
- the second threshold may be a preset value, or it may be related to the number of clients accessing the federated learning system in the current or previous round, or it may be related to the current or previous round of clients and the The amount of data communicated between servers is related to the size.
- the second threshold may be positively correlated with a value determined by the number of clients accessing the federated learning system or according to the amount of data communicated between the client and the server. The greater the number of clients in the learning system, the greater the second threshold. The greater the amount of data communicated between the client and the server in the previous iteration, the greater the second threshold, etc. Adjust according to the actual application scenario. Therefore, in the embodiments of the present application, a time window that matches the actual application scenario can be set for the timer, which can prevent the time window from being too long or too short, thereby improving the overall efficiency of federated learning.
- the master node starts the timer and the counter. If the value of the counter reaches the first threshold and the timer has not timed out, the timer can be turned off, and the next round of iteration can be entered after the aggregation of all the model update parameters fed back by the client in the current iteration process is completed. If the value of the counter does not reach the first threshold and the timer times out, it can be determined to end the current iteration and enter the next iteration, and the data in the current iteration can be discarded or retained. If the value of the counter does not reach the first threshold and the timer has not timed out, the iteration can continue.
- the timer and the counter can work together to limit the number of clients participating in federated learning in this round of iteration, and limit the duration of this round of iteration, so as to avoid waiting for clients to access Or the long tail effect caused by the client quitting halfway.
- the first server aggregates the first aggregation information and the second aggregation information to obtain third aggregation information.
- the first server After the first server obtains the first aggregation information and the second aggregation information sent by other servers, it can update the locally stored global model based on the received first aggregation information and the second aggregation information, that is, steps 407 and 408 .
- the first server can not only aggregate the model update parameters fed back by one or more clients received by itself, but also receive aggregated information obtained by other servers, and aggregate the first aggregated information calculated by itself and receive it.
- the obtained second aggregated information is aggregated again to obtain third aggregated information with more complete data.
- each server not only aggregates the model update parameters received by itself, but also re-aggregates the information aggregated by other servers in the federated learning system, so that each server can have
- the data included in each server in the federation system makes the data of each server consistent, and the client can obtain the full amount of data when accessing the server at any time.
- n k the dataset size of the kth client.
- Each server first calculates its own and Then, the master node triggers data aggregation between servers to obtain the aggregated calculated data. and n j , each server can obtain the calculation result of the cluster dimension, that is, the full amount of data. In the next iteration, the client requests any server to get the correct full amount of data.
- the first server can also send the first aggregation information to the second server, so that the second server can re-aggregate the second aggregation information obtained by itself and the received first aggregation information to obtain the third aggregation information.
- Aggregate information In order to maintain the heterogeneity of data in each server, the second server can use the same aggregation method as the first server when performing aggregation, so as to finally obtain the same third aggregation information, so that each server can use the same aggregation method as the first server. can have full data in it.
- the first server updates the stored global model.
- the third aggregation information may include the weight parameters of the aggregated model, and the first server may use the weight parameters obtained after the final aggregation to update the locally stored global model, thereby obtaining the updated global model. Model.
- the client can access the server by sending a request to the server, and can participate in federated learning after the server allows it, which is equivalent to the client accessing the server in a loose way, while the There is no need to maintain a long connection with the server, and it can adapt to some scenarios where clients are deployed on mobile terminals.
- each server in addition to aggregating the model update parameters fed back by the connected clients, each server can also aggregate the information obtained by other servers after aggregation, so that each server can have the full amount of data in the federated learning system. Accessing the server at any time can obtain the full amount of data, so that the output accuracy of the final trained model is higher.
- the present application can also count the clients accessing the server by setting a timer and/or a counter, and perform the timing during the federated learning process, and perform the federated learning within the time window of the timer. If an iterative learning process is not completed beyond the time window, the next round of learning can be directly performed, thereby avoiding the existence of some clients that affect the overall training process due to unstable connections or low training efficiency, and improving the overall learning efficiency of federated learning.
- the client can exit at any time during the federated training process without affecting the efficiency of federated learning and avoiding binding to the client.
- steps 401-408 may be one of the iterative processes in the federated learning process, and the federated learning process may include one or more iterations, and this application only takes one iteration as an example for illustration , which is not a limitation. Specifically, the number of iterations can be determined according to the actual application scenario.
- the client sends a query message to the first server.
- the client when the client (or the second client) needs to acquire the latest model, it may send a query message to the server to request to query the latest global model.
- the client can access the federated learning system through the address of the federated learning system, and the ELB can route the query message sent by the client, so as to route the request message to the adapted server.
- the client Sending a query message to the first server is only an exemplary illustration, and the first server here can be replaced with other servers, such as a second server or a third server, etc., which is not limited here.
- the first server delivers the updated global model to the client.
- the first server After receiving the query message from the client, the first server can deliver the latest local model to the client.
- the first server updates the stored global model using the latest aggregated information, and after receiving the query message, can deliver the latest global model to the client.
- the structure and parameters of the model can be delivered to the client, such as the number of layers of the neural network, the size of the convolution kernel, the resolution of input and output, the weight parameters in each network layer, etc., so that the client can obtain the latest model structure.
- step 409 and step 410 in this application are optional steps.
- each server can transmit the latest aggregated information to each other, so that each server in the federated learning system has the full amount of data, and the client can obtain the full amount of data when it accesses the server at any time. Without the need to maintain a long connection with the server, the full amount of data can be obtained, and the accuracy of the model that can be obtained by the client can be improved.
- the federated learning system provided by this application may be shown in Figure 5, where three servers (ie, FL-Server0, FL-Server1, and FL-Server2) are used for exemplary illustration, and can also be replaced with more Server.
- the scheduler ie, FL-Scheduler
- the scheduler can be used to schedule communication resources or storage resources in the server cluster, so that data can be exchanged between servers.
- the transmission control protocol (TCP) connection can be established between the servers or between the server and the scheduler, and maintain a stable long-term connection, and communicate through the TCP private protocol.
- TCP transmission control protocol
- IPX Internet packet exchange protocol
- SPX sequenced Packet Exchange
- other protocols such as Internet packet exchange protocol (internet packet exchange, IPX), sequenced packet exchange (sequenced Packet Exchange, SPX) and other protocols, which are not limited in this application.
- the client can access the federated learning system through an IP address, and the server exposes the hypertext transfer protocol (HTTP) port to the outside to provide distributed services to the outside.
- HTTP hypertext transfer protocol
- the client can establish an HTTP connection with the server, that is, a short connection.
- Server 0 can be used as the master node, other servers as slave nodes, and the master node is responsible for data synchronization within the cluster, such as triggering all nodes to aggregate, globally time or count during the federated learning process, so as to keep the data in the cluster consistent. sex.
- the server may not be able to obtain the status of the client at any time.
- the connection between the mobile terminal and the server may be unstable, and the mobile terminal cannot be retrieved or the connection status of the mobile terminal cannot be obtained. etc., which may lead to the interruption of communication between the server and the mobile terminal, which may affect the efficiency of federated learning.
- FIG. 6 a schematic flowchart of another federated learning method provided by this application.
- the timers and counters can be initialized, eg set to 0, before each iteration.
- the counter on the master node can count.
- the timer can be started synchronously, and a round of federated learning can be performed within the time window of the timer.
- the first threshold may be determined according to the number of clients participating in the federated learning in the previous round. For example, if the number of clients participating in the federated learning in the previous round is less than the number of clients in the previous iteration, the set first threshold may be Decrease on the basis of a threshold value. If the number of clients participating in federated learning in the last round of federated learning is more than that in the previous iteration, it can be increased on the basis of the originally set first threshold value, so that the threshold value corresponding to the counter is increased. Adapt to clients participating in training in each iteration to maximize training efficiency.
- FL-server 1 can deliver the global model to be trained and training configuration parameters to FL-client.
- the global model may include various neural networks, such as DNN, CNN, Residual Network (ResNet) or a constructed network, etc., which can be adjusted according to actual application scenarios.
- the training configuration parameters can include parameters involved in training the model, such as the learning rate, the number of epochs, or the category in the security algorithm, which are used to determine the learning rate, the number of iterations in training, or the encryption algorithm when transmitting data when the client performs model training. type, etc.
- the model update parameters can include parameters related to the trained model, such as update The structure of the global model (such as the depth or width of the network layer, etc.), the weight parameters, or the weight parameters after reducing the sparsity, etc.
- each FL-client can use locally stored training samples to train the received global model.
- each training sample includes the sample and the corresponding ground truth label.
- the sample is used as the input of the model, and then the loss function is used to measure the difference between the output of the model and the ground truth label, and through the difference Calculate the update gradient of the global model, and use the gradient to reversely update the weight parameters of the global model to obtain the updated model.
- FL-server 1 can aggregate the model update parameters fed back by the multiple FL-clients, which is equivalent to data reduction.
- the specific aggregation method may include summation, average value, weighted summation, etc., which may be adjusted according to actual application scenarios.
- the data obtained by FL-server 1 aggregating the model with the new data can be expressed as: where S is the client set, represents the model update parameters fed back by each client, and n k represents the size of the model update parameters fed back by each client.
- the master node that is, FL-server 0 triggers the server where it is located and the slave nodes (such as FL-server 1, FL-server 2, etc.) to transmit to each other between each server.
- the slave nodes such as FL-server 1, FL-server 2, etc.
- the aggregation information triggered by FL-server 0 from the node is used as an example for illustration.
- FL-server 2 will send the aggregation information obtained by the aggregation of the received model update parameters to FL-server 1, and FL-server 1 will aggregate the received aggregation information and the locally stored aggregation. The information is aggregated to obtain the final aggregated information.
- the current iteration can be terminated and the next iteration is entered, that is, the counter is initialized.
- FL-server 0 sends a second trigger instruction to FL-server 1 and FL-server 2 to indicate entering the next round of iteration.
- the data during the current iteration can be discarded or enabled. For example, if the data is complete, it can continue to be used, and only part of the unresponsive client data is discarded, which can be adjusted according to the actual application scenario.
- the master node can maintain the counter and timer, count the clients that have sent the request, and when the timer reaches the threshold, the aggregation operation of the cluster dimension can be triggered, that is, each server can receive The aggregated information of other servers is aggregated again, so that each server can have the aggregated information of other servers, and each server can have the full amount of data.
- the master node also maintains a timer.
- the next round of iteration can be performed, which is equivalent to only one round of iterative learning within the time window of the timer to prevent the waiting time for the client's response from being too long. It affects the efficiency of federated learning and solves the long tail effect caused by clients being left behind.
- the master node as an important node for monitoring the duration of each iteration and the number of clients participating in the iteration, can mainly be used to perform the following steps:
- the master node After any one of the multiple servers receives the first request message, the master node starts the counter and the timer.
- the counter is used to count the request messages received by the multiple servers in one iteration.
- the request message It is used to request to obtain the global model stored in the corresponding server among the multiple servers; if the value of the counter reaches the first threshold, the master node sends a first trigger indication to each of the multiple servers, and the first trigger
- the instruction is used to instruct multiple servers to transmit locally stored information to each other; if the value of the counter does not reach the first threshold and the value of the timer reaches the second threshold, the master node sends a second trigger instruction to each server, The second trigger indication is used to instruct each server to perform the next round of iteration.
- one of the servers in the federated learning system is used as the master node, and the master node maintains the timer and the counter.
- the counter counts the received request messages, and at the same time Use a timer for timing, so as to limit the number of clients participating in each iteration and the training duration of each iteration, so as to avoid meaningless waiting caused by clients falling behind, improve the overall efficiency of federated learning, and avoid long tails effect.
- the first threshold is a preset value, or the first threshold is related to the number of clients accessing the federated learning system during the previous iteration. Therefore, in the embodiment of the present application, the first threshold corresponding to the counter may be a preset value, or a value determined according to the number of clients accessing all servers in the federated learning system. Usually, the first threshold is not greater than the access The number of clients in the federated learning system, so as to avoid the long waiting time for federated learning due to waiting for client access, and improve the overall efficiency of federated learning.
- the second threshold is a preset value, or the second threshold is related to the number of clients accessing the federated learning system in the previous iteration, or the second threshold is related to the federated learning in the previous iteration
- the amount of data communicated between each server in the system and the corresponding client is related to the amount of data. Therefore, in the embodiment of the present application, the time window of each iteration can be set by a timer, and when the timer times out, the next round of iterative learning can be entered, so as to avoid the long tail effect caused by the long waiting time, even if If the client quits during federated learning, it does not affect the overall training efficiency of federated learning.
- Counters and timers may be maintained by the master node, such as the FL-Server 0 shown in Figure 5 above.
- each iteration initialize the counter and set the number of request messages from clients participating in this iteration. For example, if it is set to C, usually this value can be set to a subset of the number of clients that have communication connections with the server. That is, it usually does not exceed the number of clients that have established connections with the server, so as to avoid excessively long waiting times for clients in each round of iterations. If in the federated learning system, 100 clients have established connections with the server, but only 10 clients need to participate in federated learning, if C is set too large at this time, it may lead to too long waiting for clients to participate. , reducing the efficiency of federated learning.
- the FL-Server receives a request message from the FL-Client.
- the request message sent by the FL-Client can be routed by the ELB to the FL-Server, where the FL-Server can be any server in the federated learning system, such as FL-Server0, FL-Server1 or FL-Server in the aforementioned Figure 5 Any of the servers such as Server2.
- the FL-Server can confirm whether the FL-Client is allowed to participate in federated learning. For example, random screening, first-come-first-served or other screening methods can be used to determine whether FL-Client is allowed to participate in federated learning.
- the connection status of the FL-Client such as delay, communication duration, etc.
- the status of the FL-Client such as low battery or high load, etc., the FL-Client may not be allowed to participate in federated learning. , to avoid FL-Client affecting the efficiency of federated learning.
- the counter is accumulated.
- the master node may also record the number of FL-Clients that have sent the request message, and accumulate the counter.
- step 704. Determine whether the counter is 1, if yes, go to step 708, if not, go to step 705.
- the value of the counter can be monitored. If the value of the counter is 1, that is, in the current iteration process, the request message sent by the first client is received, and the timing can be started at this time. , that is, step 709 is executed. At this time, only because the request message sent by only one client is received, so there is no need to perform data synchronization. After receiving the request messages sent by multiple clients, the data can be processed in the cluster dimension. Synchronize, so that the servers in the cluster maintain data consistency, that is, step 705 is executed.
- the FL-Server processes the request and synchronizes data in the cluster dimension.
- the model and training configuration parameters can be delivered to all or part of the multiple FL-Clients.
- FL-Client uses locally stored training samples to train the model, and feeds back the relevant data of the trained model to FL-Server.
- FL-Server can receive model update parameters fed back by multiple FL-Clients, and FL-Client can aggregate the model update parameters fed back by multiple FL-Clients to obtain Aggregated information (ie, first aggregated information).
- Aggregated information ie, first aggregated information.
- the aggregated information calculated by each FL-Server can be synchronized to realize the data synchronization of the cluster dimension.
- each FL-Server After each FL-Server has synchronized its own aggregation information, the aggregation information obtained by its own aggregation (ie the first aggregation information) and the aggregation information sent by other FL-Servers (ie the second aggregation information) can be aggregated to obtain The final global aggregated information (that is, the third aggregated information), so that each FL-Server can have full data.
- the manner of aggregation between FL-Servers can be specifically shown in Figure 8, wherein the federated learning system includes multiple servers, and the multiple servers are connected to each other in a ring, that is, as shown in Figure 8.
- the shown Server 0, Server 1, Server 2, Server 3, etc. are only 4 servers as an exemplary description here, and can be replaced by more or less servers in actual application scenarios, which are not described here. limited.
- the client includes K clients connected to the server, such as client 1 to client K shown in Figure 8, each client receives the initial weight wt sent by the connected server, and then uses the locally stored training
- the sample trains the model corresponding to the initial weight to obtain the weight of the trained model
- the corresponding data volume n 1 -n k , n k represents the data set size of the kth client, and then feeds the weight of the trained model and the amount of data to the connected server, for example, client 1 sends the trained model to the The weight is fed back to Server 0, and client K feeds back the weight of the model obtained after training to Server 2, etc.
- the server j aggregates the received weights sent by the client to obtain Then the master node triggers each server to aggregate the data transmitted between the servers to obtain aggregated information.
- the aggregated data of each server is expressed as and n j .
- step 707 Whether the count reaches the first threshold, if yes, go to step 710, if not, go to step 702.
- the value of the counter can be monitored in real time. If the count of the counter reaches the first threshold, it means that the number of clients participating in this round of federated learning iterative process is sufficient, and there is no need to increase the number of updates. If more clients participate in the current round of federated learning, that is, stop timing, and enter the iterative process of the next round of federated learning, that is, step 710 is executed.
- the timer can be triggered to start timing.
- the ELB also notifies the master node that the first request message has been received, and the master node can start the timer and The counter is used to count the request messages received by the federated learning system and time the process of this round of iteration.
- step 710 Check whether it times out, and if so, perform step 710.
- the state of the timer can be monitored in real time to determine whether the timer has timed out.
- the request message or data sent by the client received within the time window T of the timer can be processed normally by the server, such as performing encryption and decryption or aggregating the received data. If C has not been reached, both the timer and the counter are reset to the initial state and enter the next round of iteration; if the counter reaches C within the time window T, the timing of the timer will be stopped, and the current iteration will be processed. After the data sent by the client, enter the next round of iteration.
- the longest time spent in each iteration is the duration of the time window T, and during the iteration process, the communication duration of each iteration can be recorded, Thereby, the duration of the time window is dynamically adjusted, thereby eliminating the long tail effect of the client and improving the training efficiency.
- the function of the timer may also be to trigger data synchronization between servers in the cluster, that is, the aforementioned steps 705-706, after completing steps 705-706, the next round of iteration can be entered. It can be adjusted according to actual application scenarios.
- step 701 After the timer stops timing, it means that this round of iterative learning is ended, and after the timer and the counter are initialized, the next round of iteration can be entered, that is, step 701 is executed.
- step 702, step 705 and step 706 can be executed by any FL-Server in the cluster, and step 701, step 703, step 704, step 707-step 710 are executed by the master node.
- the client can access the server in a combination of request and response, and monitor the clients participating in each iteration and the iteration duration by setting counters and timers, even if the client is in The midway exit of federated learning does not affect the overall learning process of federated learning of the cluster, avoids the problem of tail drop caused by not being able to address clients, and improves the overall efficiency of federated learning.
- data synchronization can be performed between each server, that is, each server has the full amount of data in the cluster, and the client can access the full amount of data at any time by accessing any server, thereby improving the model obtained by the client. accuracy.
- the federated learning system can be composed of multiple servers, and the client is deployed on the user's mobile terminals, such as mobile phones, cameras or tablets.
- the mobile terminal can collect the image through the camera, and the user can manually mark the type of the object in the photographed image in the album, such as "cat" and "dog” in the image, and the mobile terminal can use the manually marked image as the image.
- Training samples When the number of training samples reaches a certain amount, the mobile terminal can request the federated learning system to participate in federated learning, so as to apply the data collected by itself to the classification network.
- the mobile terminal can obtain the HTTP port exposed by the federated learning system, and send a request message to the cluster through the HTTP port.
- the counter set on the master node counts the request message, and the master node also maintains a timer.
- the upper limit of the counter can be preset, or determined according to the number of clients participating in the previous iteration, or determined according to the number of clients that have established connections with the cluster.
- the upper limit of the timer may be preset, or may be determined according to the duration of the previous iteration.
- the timer is enabled. If the request message sent by the mobile terminal is the last request message in this iteration, that is, the value of the counter reaches the preset first threshold, the mobile terminal is the last client to participate in federated learning determined in this iteration , after the mobile terminal completes the model training, it can enter the next round of iterative learning.
- the ELB within the cluster routes the request message of the mobile terminal to the adapted server, which determines whether the mobile terminal is allowed to participate in federated learning. If the server allows the mobile terminal to participate in federated learning, the server sends the locally stored classification network to the mobile terminal, and also sends training configuration parameters, such as the learning rate used during training, and the data encryption and decryption methods. The mobile terminal uses the locally collected training samples to train the classification network according to the training configuration parameters, and obtains the trained classification network. The mobile terminal feeds back the weight parameter of the classification network after training to the server. If the time when the server receives the weight parameter is within the time window of the timer, the server aggregates the received weight parameter to obtain the first aggregation. information.
- the server sends the locally stored classification network to the mobile terminal, and also sends training configuration parameters, such as the learning rate used during training, and the data encryption and decryption methods.
- the mobile terminal uses the locally collected training samples to train the classification network according to the training configuration parameters, and obtains the trained
- the aggregated information can be transmitted between servers, so that each server has full data.
- the mobile terminal can send a query message to the server at any time to query the latest classified network, and the server can send the stored latest classified network to the mobile terminal, so that the mobile terminal can access any server at any time. to the latest full data.
- the client when the client is deployed on the mobile terminal, the client can access the federated learning system and participate in the federated learning in a loose manner by combining requests and responses, without maintaining a stable long connection. , which is applicable to scenarios where clients are deployed on mobile terminals.
- timers and counters are also deployed in the federated learning system. Through the cooperation of timers and counters, the low efficiency of federated learning caused by waiting for clients is avoided, the long tail effect is reduced, and the overall efficiency of federated learning is improved.
- the upper limit of the timer or counter can be adjusted according to the information in the previous iteration or previous rounds of iterative training, and the upper limit of the timer or counter can be dynamically adjusted according to the actual scene, further improving the performance of federated learning. efficiency.
- the present application provides a schematic structural diagram of a server.
- the server is applied to a federated learning system.
- the federated learning system includes multiple servers and multiple clients. Multiple servers are used for iterative federated learning. Any one of the servers includes:
- a transceiver module 901 configured to receive a request message sent by at least one first client, where the request message is used to request the global model stored in the first server, and the multiple clients include at least one first client;
- the transceiver module 901 is further configured to send global model information and training configuration parameters to at least one first client, where the global model information and training configuration parameters are used to instruct at least one first client to use the training configuration parameters to train the global model ;
- the transceiver module 901 is further configured to receive first model update parameters fed back by at least one first client respectively, where the first model update parameters are parameters of the global model obtained after the at least one first client is trained;
- an aggregation module 902 configured to aggregate the first model update parameters fed back by at least one first client to obtain the first aggregation information in this round of iteration;
- the transceiver module 901 is further configured to acquire second aggregation information sent by the second server, where the second aggregation information is information obtained by the second server in the current iteration by aggregating the received second model update parameters;
- the updating module 903 is configured to update the global model stored on the first server based on the first aggregation information and the second aggregation information to obtain an updated global model.
- the transceiver module is further configured to receive second aggregated information respectively sent by multiple second servers;
- the updating module 903 is specifically configured to update the global model stored on the first server based on the first aggregation information and the second aggregation information respectively sent by the plurality of second servers to obtain an updated global model.
- the first server is a master node in the federated learning system, and the master node is used to manage multiple servers,
- the transceiver module is further configured to send a first trigger instruction to the second server, where the first trigger instruction is used to instruct the second server to send the second aggregation information to the first server;
- the transceiver module 901 is further configured to receive the second aggregation information in the second server.
- a counter is set in the first server, and the counter is used to count the number of request messages received by multiple servers,
- the transceiver module 901 is specifically configured to send a first trigger instruction to the second server when the value of the counter meets the first threshold.
- the first threshold is a preset value, or the first threshold is related to the number of clients accessing the federated learning system during the previous iteration.
- the master node includes a timer, and the timer starts timing after receiving the first request message in each round of iteration,
- the transceiver module 901 is specifically configured to receive a second trigger instruction sent by the master node when the timer exceeds the second threshold, where the second trigger instruction is used to instruct the next round of iterative learning.
- the second threshold is a preset value, or the second threshold is related to the number of clients accessing the federated learning system in the previous iteration, or the second threshold is related to the previous iteration
- the amount of data communicated between each server and the corresponding client in the federated learning system is related to the amount of data.
- the transceiver module is further configured to receive a query message sent by a third client, where the third client includes any one of the clients accessing the federated learning system;
- the transceiver module 901 is further configured to send the updated global model information to the third client for the query message.
- the transceiver module is further configured to send the first aggregated information to the second server, so that the second server updates the locally stored global model based on the first aggregated information and the second aggregated information to Get the updated global model.
- the present application provides a schematic structural diagram of another server, that is, the aforementioned master node.
- the server is applied to the federated learning system.
- the federated learning system includes multiple servers. One of the multiple servers serves as the master node.
- the multiple servers are used for iterative learning to realize the federated learning.
- the master node includes:
- the starting module 1001 is used to start a counter and a timer after any one of the multiple servers receives the first request message, and the counter is used for the requests received by the multiple servers in one iteration.
- the message is counted, and the request message is used to request to obtain the global model stored in the corresponding server in multiple servers;
- the transceiver module 1002 is configured to, if the value of the counter reaches the first threshold, the master node sends a first trigger instruction to each of the multiple servers, and the first trigger instruction is used to instruct the multiple servers to transmit local data to each other. stored information;
- the transceiver module 1002 is further configured to, if the value of the counter does not reach the first threshold and the value of the timer reaches the second threshold, the master node sends a second trigger indication to each server, and the second trigger indication is used to indicate each service end for the next iteration.
- the first threshold is a preset value, or the first threshold is related to the number of clients accessing the federated learning system during the previous iteration.
- the second threshold is a preset value, or the second threshold is related to the number of clients accessing the federated learning system in the previous iteration, or the second threshold is related to the federated learning in the previous iteration
- the amount of data communicated between each server in the system and the corresponding client is related to the amount of data.
- FIG. 11 is a schematic structural diagram of a server provided by an embodiment of the present application.
- the server described in the embodiment corresponding to FIG. 9 may be deployed on the server 1100 to implement the functions of the server in the embodiment corresponding to FIG. 4 to FIG. 8 .
- the server 1100 may vary greatly due to different configurations or performances, and may include one or more central processing units (CPU1122, for example, one or more processors) and memory 1132, and one or more storage applications 1142 or storage medium 1130 for data 1144 (eg, one or more mass storage devices).
- CPU1122 central processing units
- storage applications 1142 or storage medium 1130 for data 1144 eg, one or more mass storage devices.
- the memory 1132 and the storage medium 1130 may be short-term storage or persistent storage.
- the memory 1132 is a random access memory RAM, which can directly exchange data with the central processing unit 1122 for loading the data 1144 and the application program 1142 and/or the operating system 1141 for the central processing unit 1122 to run and use directly, Typically used as a temporary data storage medium for the operating system or other running programs.
- the program stored in the storage medium 1130 may include one or more modules (not shown in FIG. 11 ), and each module may include a series of instruction operations on the server.
- the central processing unit 1122 may be configured to communicate with the storage medium 1130 to execute a series of instruction operations in the storage medium 1130 on the server 1100 .
- the storage medium 1130 stores program instructions and data corresponding to the method steps shown in any of the foregoing embodiments in FIG. 4 to FIG. 8 .
- the server 1100 may also include one or more power supplies 1126, one or more wired or wireless network interfaces 1150, one or more input and output interfaces 1158, and/or, one or more operating systems 1141, such as Windows Server TM , Mac OS X TM , Unix TM , Linux TM , FreeBSD TM and many more.
- operating systems 1141 such as Windows Server TM , Mac OS X TM , Unix TM , Linux TM , FreeBSD TM and many more.
- the central processing unit 1122 is configured to execute the method steps performed by the server shown in any of the foregoing embodiments in FIG. 4 to FIG. 8 .
- the embodiments of the present application also provide a federated learning device, which may also be referred to as a digital processing chip or a chip.
- the chip includes a processing unit and a communication interface.
- the processing unit can obtain program instructions through the communication interface, and the program instructions are processed by the processing unit.
- the processing unit is configured to execute the method steps executed by the server shown in any of the foregoing embodiments in FIG. 4 to FIG. 8 .
- the embodiments of the present application also provide a digital processing chip.
- the digital processing chip integrates circuits and one or more interfaces for realizing the above-mentioned central processing unit 1122 or the functions of the central processing unit 1122 .
- the digital processing chip can perform the method steps of any one or more of the foregoing embodiments.
- the digital processing chip does not integrate the memory, it can be connected with the external memory through the communication interface.
- the digital processing chip implements the actions performed by the server in the above embodiment according to the program codes stored in the external memory.
- the federated learning device when the federated learning device provided by the embodiment of the present application is a chip, the chip specifically includes: a processing unit and a communication unit, the processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface , pins or circuits, etc.
- the processing unit can execute the computer-executed instructions stored in the storage unit, so that the chip in the server can execute the federated learning method described in the embodiments shown in FIG. 4 to FIG. 8 .
- the aforementioned storage unit may be a storage unit within a chip, such as a register, a cache, etc., and the storage unit may also be a storage unit located outside the chip in the wireless access device, such as a read-only storage unit.
- Memory read-only memory, ROM
- static storage devices that can store static information and instructions, random access memory RAM, etc.
- the aforementioned processing unit or processor may be a central processing unit, a network processing unit (neural-network processing unit, NPU), a graphics processing unit (graphics processing unit, GPU), a digital signal processor (digital signal processor, DSP) ), application specific integrated circuit (ASIC) or field programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
- NPU network processing unit
- graphics processing unit graphics processing unit
- DSP digital signal processor
- ASIC application specific integrated circuit
- FPGA field programmable gate array
- a general purpose processor may be a microprocessor or it may be any conventional processor or the like.
- the processor mentioned in any one of the above may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the programs of the above-mentioned methods in FIGS. 4-8 .
- Embodiments of the present application further provide a computer-readable storage medium, where a program is stored in the computer-readable storage medium, and when the computer-readable storage medium runs on a computer, the computer executes the operations described in the foregoing embodiments shown in FIGS. 4 to 8 . steps in the method.
- Embodiments of the present application also provide a computer program product, which, when running on a computer, causes the computer to execute the steps executed by the server in the methods described in the foregoing embodiments shown in FIGS. 4-8 .
- the device embodiments described above are only schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be A physical unit, which can be located in one place or distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
- the connection relationship between the modules indicates that there is a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.
- U disk U disk
- mobile hard disk read-only memory ROM
- random access memory RAM magnetic disk or optical disk, etc.
- a computer device which can be a personal computer, server, or network device, etc.
- the computer program product includes one or more computer instructions.
- the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
- the computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be from a website site, computer, server or data.
- the center transmits to another website site, computer, server or data center by wire (eg coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (eg infrared, wireless, microwave, etc.).
- wire eg coaxial cable, optical fiber, digital subscriber line (DSL)
- wireless eg infrared, wireless, microwave, etc.
- the computer-readable storage medium may be any available medium that can be stored by a computer, or a data storage device such as a server, a data center, etc. that includes one or more available media integrated.
- the usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVDs), or semiconductor media (eg, solid state disks (SSDs)), and the like.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Information Transfer Between Computers (AREA)
Abstract
Description
Claims (23)
- 一种联邦学习方法,其特征在于,应用于联邦学习系统,所述联邦学习系统中包括多个服务端和多个客户端,所述多个服务端包括第一服务端和第二服务端,所述多个服务端用于进行迭代学习以实现联邦学习,其中所述迭代学习中的任意一轮迭代的过程包括:所述第一服务端接收至少一个第一客户端发送的请求消息,所述请求消息用于请求所述第一服务器中存储的全局模型;所述第一服务端向所述至少一个第一客户端发送所述全局模型的信息和训练配置参数;所述第一服务端接收至少一个第一客户端分别反馈的第一模型更新参数,所述第一模型更新参数是所述至少一个第一客户端训练的全局模型的参数;所述第一服务端对所述至少一个第一客户端反馈的第一模型更新参数进行聚合,以得到在本轮迭代中的第一聚合信息;所述第一服务端获取所述第二服务端发送的第二聚合信息,所述第二聚合信息为所述第二服务端在本轮迭代中,对接收到的第二模型更新参数进行聚合得到的信息;所述第一服务端基于所述第一聚合信息和所述第二聚合信息,更新所述第一服务端上存储的所述全局模型,以得到更新后的全局模型。
- 根据权利要求1所述的方法,其特征在于,所述多个服务端中包括的所述第二服务端的数量为多个,所述第一服务端获取所述第二服务端发送的第二聚合信息,包括:所述第一服务端接收所述多个第二服务端分别发送的所述第二聚合信息;所述第一服务端基于所述第一聚合信息和所述第二聚合信息,更新所述第一服务端上存储的所述全局模型,包括:所述第一服务端基于所述第一聚合信息和所述多个第二服务端分别发送的所述第二聚合信息,更新所述第一服务端上存储的所述全局模型,以得到更新后的全局模型。
- 根据权利要求1或2所述的方法,其特征在于,所述第一服务端为所述联邦学习系统中的主节点,所述主节点用于对所述多个服务端进行管理,所述第一服务端获取第二服务端发送的第二聚合信息,还包括:所述第一服务端向所述第二服务器发送第一触发指示,所述第一触发指示用于指示所述第二服务端向所述第一服务端发送所述第二聚合信息;所述第一服务端接收所述第二服务端中的第二聚合信息。
- 根据权利要求2所述的方法,其特征在于,所述第一服务器中设置有计数器,所述计数器用于对所述多个服务端接收到的所述请求消息的数量进行计数,所述第一服务端向所述第二服务器发送第一触发指示,包括:当所述计数器的数值满足第一阈值时,所述第一服务端向所述第二服务端发送所述第一触发指示。
- 根据权利要求4所述的方法,其特征在于,所述第一阈值为预设值,或者,所述第一阈值与上一轮迭代过程中接入所述联邦学习系统的客户端的数量相关。
- 根据权利要求3-5中任一项所述的方法,其特征在于,所述主节点中包括计时器, 所述计时器从每一轮迭代中接收到第一个请求消息之后开始计时,所述方法还包括:当所述计时器超过第二阈值时,所述第一服务端向所述第二服务器发送的第二触发指示,所述第二触发指示用于指示所述第二服务器进行下一轮迭代。
- 根据权利要求6所述的方法,其特征在于,所述第二阈值为预设值,或者,所述第二阈值与上一轮迭代中接入所述联邦学习系统的客户端的数量相关,或者,所述第二阈值与上一轮迭代中联邦学习系统内的每个服务端和对应的客户端之间进行通信的数据量的大小相关。
- 根据权利要求1-7中任一项所述的方法,其特征在于,所述方法还包括:所述第一服务端接收第三客户端发送的查询消息,所述第三客户端包括接入所述联邦学习系统中的客户端中的任意一个;所述第一服务端针对所述查询消息向所述第三客户端发送更新后的全局模型的信息。
- 根据权利要求1-8中任一项所述的方法,其特征在于,所述方法还包括:所述第一服务端向所述第二服务端发送所述第一聚合信息,以使所述第二服务端基于所述第一聚合信息和所述第二聚合信息,更新本地存储的所述全局模型以得到所述更新后的全局模型。
- 一种联邦学习方法,其特征在于,应用于联邦学习系统和多个客户端,所述联邦学习系统包括多个服务端,所述多个服务端中的其中一个服务端作为主节点,所述多个服务端用于进行迭代学习以实现联邦学习;其中所述迭代学习中的任意一轮迭代的过程包括:在所述多个服务端中任意一个服务端接收到第一个请求消息之后,所述主节点启动计数器和计时器,所述计数器用于对所述多个服务端在一轮迭代中接收到的请求消息进行计数,所述请求消息用于客户端请求获取所述多个服务端中对应的服务端中存储的全局模型;若所述计数器的值达到第一阈值,所述主节点向所述多个服务端中的每个服务端发送第一触发指示,所述第一触发指示用于指示所述多个服务端之间互相传输本地存储的信息;若所述计数器的值未达到所述第一阈值,所述计时器的值达到第二阈值,则所述主节点向所述每个服务端发送第二触发指示,所述第二触发指示用于指示所述每个服务端进行下一轮迭代。
- 根据权利要求10所述的方法,其特征在于,所述第一阈值为预设值,或者,所述第一阈值与上一轮迭代过程中接入所述联邦学习系统的客户端的数量相关。
- 根据权利要求10或11所述的方法,其特征在于,所述第二阈值为预设值,或者,所述第二阈值与上一轮迭代中接入所述联邦学习系统的客户端的数量相关或者所述第二阈值与上一轮迭代中联邦学习系统内的每个服务端和对应的客户端之间进行通信的数据量的大小相关。
- 一种联邦学习系统,其特征在于,包括多个服务端和多个客户端,所述多个服务端包括第一服务端和第二服务端,所述第一服务端和所述第二服务端上均存储有全局模型的信息,所述多个服务端用于进行迭代学习以实现联邦学习,其中所述迭代学习中的任意一轮迭代过程中:所述第一服务端用于接收至少一个第一客户端分别发送的请求消息;所述第一服务端用于针对所述至少一个第一客户端分别发送的请求消息向所述至少一个第一客户端发送所述全局模型的信息和训练配置参数;所述第一服务端用于接收所述至少一个第一客户端分别反馈的第一模型更新参数,所述第一模型更新参数为所述至少一个第一客户端训练后得到的全局模型的参数;所述第一服务端用于对所述至少一个第一客户端分别反馈的第一模型更新参数进行聚合,得到第一聚合信息;所述第二服务端用于接收对应的至少一个第二客户端发送的第二模型更新参数,并对所述至少一个第二客户端发送的第二模型更新参数进行聚合,以得到第二聚合信息;所述第一服务端用于接收所述每个第二服务端发送的第二聚合信息;所述第一服务端用于基于所述第一聚合信息和所述每个第二服务端发送的第二聚合信息,更新所述第一服务端上存储的所述全局模型,以得到更新后的全局模型。
- 根据权利要求13所述的系统,其特征在于,所述多个服务端中包括的所述第二服务端的数量为多个;所述多个第二服务端中的每个第二服务端用于接收对应的至少一个第二客户端发送的第二模型更新参数,并对所述至少一个第二客户端发送的第二模型更新参数进行聚合,以得到所述第二聚合信息;所述第一服务端具体用于接收所述多个第二服务端分别发送的所述第二聚合信息;所述第一服务端具体用于基于所述第一聚合信息和所述多个第二服务端分别发送的所述第二聚合信息,更新所述第一服务端上存储的所述全局模型,以得到更新后的全局模型。
- 根据权利要求13或14所述的系统,其特征在于,所述多个服务端中还包括作为主节点的第三服务端,所述主节点用于对所述多个服务端进行管理;所述主节点用于向所述多个服务端分别发送第一触发指示;所述第二服务器具体用于基于所述第一触发指示向所述第一服务器发送所述第二聚合信息。
- 根据权利要求15所述的系统,其特征在于,所述主节点中包括计数器,所述计数器用于对所述多个服务端接收到的请求消息的数量进行计数,所述请求消息用于请求获取所述多个服务端中对应的服务端中存储的全局模型;所述主节点具体用于当所述计数器的数值满足第一阈值时,向所述多个服务端中的每个服务端发送所述第一触发指示,所述第一触发指示用于触发所述每个第二服务器向所述第一服务器发送所述第二聚合信息。
- 根据权利要求15或16所述的系统,其特征在于,所述主节点还包括计时器,所述计时器从每一轮迭代过程中接收到的第一个所述请求消息开始计时;所述主节点还用于当所述计时器超过第二阈值时,向所述多个服务端中的每个服务端分别发送第二触发指示,所述第二触发指示用于指示所述多个服务端进行下一轮迭代。
- 根据权利要求17所述的系统,其特征在于,所述第二阈值为预设值,或者,所述第二阈值与上一轮迭代过程中接入所述联邦学习系统中每个服务端的客户端的数量相关, 或者,所述第二阈值与上一轮迭代过程中所述多个服务端和所述多个客户端之间进行通信的数据量的大小相关。
- 根据权利要求13-18中任一项所述的系统,其特征在于,所述第一服务端接收第三客户端发送的查询消息,所述第三客户端包括接入所述联邦学习系统的任意一个客户端;所述第一服务端针对所述查询消息向所述第三客户端发送更新后的全局模型的信息。
- 根据权利要求13-19中任一项所述的系统,其特征在于,所述第一服务端还用于向所述第二服务端发送所述第一聚合信息;所述第二服务器端具体用于结合所述第一聚合信息和所述第二聚合信息,更新本地存储的所述全局模型,以得到所述更新后的全局模型。
- 一种联邦学习装置,其特征在于,包括处理器,所述处理器和存储器耦合,所述存储器存储有程序,当所述存储器存储的程序指令被所述处理器执行时实现权利要求1-9或者10-12中任一项所述的方法的步骤。
- 一种计算机可读存储介质,包括程序,当其被处理单元所执行时,执行如权利要求1至9或者10-12中任一项所述的方法的步骤。
- 一种计算机程序产品,其特征在于,所述计算机程序产品包括软件代码,所述软件代码用于执行如权利要求1至9或者10-12中任一项所述的方法的步骤。
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP22794675.3A EP4318338A4 (en) | 2021-04-25 | 2022-04-19 | FEDERATE LEARNING METHOD AND APPARATUS |
| US18/493,136 US20240054354A1 (en) | 2021-04-25 | 2023-10-24 | Federated learning method and apparatus |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202110450585.7A CN115249073B (zh) | 2021-04-25 | 2021-04-25 | 一种联邦学习方法以及装置 |
| CN202110450585.7 | 2021-04-25 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US18/493,136 Continuation US20240054354A1 (en) | 2021-04-25 | 2023-10-24 | Federated learning method and apparatus |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2022228204A1 true WO2022228204A1 (zh) | 2022-11-03 |
Family
ID=83696603
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2022/087647 Ceased WO2022228204A1 (zh) | 2021-04-25 | 2022-04-19 | 一种联邦学习方法以及装置 |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US20240054354A1 (zh) |
| EP (1) | EP4318338A4 (zh) |
| CN (1) | CN115249073B (zh) |
| WO (1) | WO2022228204A1 (zh) |
Cited By (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115775010A (zh) * | 2022-11-23 | 2023-03-10 | 国网江苏省电力有限公司信息通信分公司 | 基于横向联邦学习的电力数据共享方法 |
| CN115905978A (zh) * | 2022-11-18 | 2023-04-04 | 安徽工业大学 | 基于分层联邦学习的故障诊断方法及系统 |
| CN115983409A (zh) * | 2022-11-11 | 2023-04-18 | 北京大学 | 基于差分隐私的联邦学习训练方法、装置、系统及设备 |
| CN116384506A (zh) * | 2023-03-28 | 2023-07-04 | 支付宝(杭州)信息技术有限公司 | 一种模型训练的方法、装置、存储介质及电子设备 |
| CN117035009A (zh) * | 2023-08-24 | 2023-11-10 | 重庆大学 | 一种基于数据集压缩的个性化通信方法 |
| CN117196010A (zh) * | 2023-08-03 | 2023-12-08 | 西安电子科技大学杭州研究院 | 一种面向网联汽车的区块链分布式联邦学习方法、系统及终端 |
| CN117390448A (zh) * | 2023-10-25 | 2024-01-12 | 西安交通大学 | 一种用于云际联邦学习的客户端模型聚合方法及相关系统 |
| CN117436515A (zh) * | 2023-12-07 | 2024-01-23 | 四川警察学院 | 联邦学习方法、系统、装置以及存储介质 |
| CN117472866A (zh) * | 2023-12-27 | 2024-01-30 | 齐鲁工业大学(山东省科学院) | 一种区块链监管与激励下的联邦学习数据共享方法 |
| WO2024263820A3 (en) * | 2023-06-20 | 2025-02-06 | Xeba Technologies, LLC | Object messaging and intelligent objects (omio) |
| CN119558423A (zh) * | 2023-09-01 | 2025-03-04 | 安徽工业大学 | 一种基于客户端目标增强的个性化联邦学习方法 |
Families Citing this family (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20240028870A1 (en) * | 2022-07-22 | 2024-01-25 | Capital One Services, Llc | Selective reporting of machine learning parameters for federated learning |
| US20240104426A1 (en) * | 2022-09-26 | 2024-03-28 | Qualcomm Incorporated | MACHINE LEARNING SIGNALING AND OPERATIONS FOR WIRELESS LOCAL AREA NETWORKS (WLANs) |
| CN116383755B (zh) * | 2022-12-30 | 2026-04-17 | 科大讯飞股份有限公司 | 节点协同方法、装置、网络、节点设备和存储介质 |
| CN116011991B (zh) * | 2022-12-30 | 2023-12-19 | 中国电子科技集团公司第三十八研究所 | 基于代理和备份技术的多人协同任务保障方法 |
| WO2025054971A1 (en) * | 2023-09-15 | 2025-03-20 | Huawei Technologies Co., Ltd. | Apparatus, method and readable storage medium for model training |
| CN118446335A (zh) * | 2024-05-07 | 2024-08-06 | 中国电信股份有限公司技术创新中心 | 联邦学习方法、装置、系统、客户端及服务器 |
| CN121262286B (zh) * | 2025-12-04 | 2026-04-24 | 厦门理工学院 | 一种冲突感知的多服务器联邦学习客户端调度方法、装置 |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180331897A1 (en) * | 2016-01-30 | 2018-11-15 | Huawei Technologies Co.,Ltd. | Method and device for training model in distributed system |
| US10372504B2 (en) * | 2017-08-03 | 2019-08-06 | Akamai Technologies, Inc. | Global usage tracking and quota enforcement in a distributed computing system |
| CN110175283A (zh) * | 2019-05-10 | 2019-08-27 | 深圳前海微众银行股份有限公司 | 一种推荐模型的生成方法及装置 |
| CN110572253A (zh) * | 2019-09-16 | 2019-12-13 | 济南大学 | 一种联邦学习训练数据隐私性增强方法及系统 |
| US20200027033A1 (en) * | 2018-07-19 | 2020-01-23 | Adobe Inc. | Updating Machine Learning Models On Edge Servers |
| CN112257874A (zh) * | 2020-11-13 | 2021-01-22 | 腾讯科技(深圳)有限公司 | 分布式机器学习系统的机器学习方法、装置、系统 |
| CN112686393A (zh) * | 2020-12-31 | 2021-04-20 | 华南理工大学 | 一种联邦学习系统 |
Family Cites Families (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111444021B (zh) * | 2020-04-02 | 2023-03-24 | 电子科技大学 | 基于分布式机器学习的同步训练方法、服务器及系统 |
| CN112181666B (zh) * | 2020-10-26 | 2023-09-01 | 华侨大学 | 一种基于边缘智能的设备评估和联邦学习重要性聚合方法 |
| CN112532451B (zh) * | 2020-11-30 | 2022-04-26 | 安徽工业大学 | 基于异步通信的分层联邦学习方法、装置、终端设备及存储介质 |
| CN112508205B (zh) * | 2020-12-04 | 2024-07-16 | 中国科学院深圳先进技术研究院 | 一种联邦学习调度方法、装置及系统 |
| CN112668726B (zh) * | 2020-12-25 | 2023-07-11 | 中山大学 | 一种高效通信且保护隐私的个性化联邦学习方法 |
-
2021
- 2021-04-25 CN CN202110450585.7A patent/CN115249073B/zh active Active
-
2022
- 2022-04-19 EP EP22794675.3A patent/EP4318338A4/en active Pending
- 2022-04-19 WO PCT/CN2022/087647 patent/WO2022228204A1/zh not_active Ceased
-
2023
- 2023-10-24 US US18/493,136 patent/US20240054354A1/en active Pending
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180331897A1 (en) * | 2016-01-30 | 2018-11-15 | Huawei Technologies Co.,Ltd. | Method and device for training model in distributed system |
| US10372504B2 (en) * | 2017-08-03 | 2019-08-06 | Akamai Technologies, Inc. | Global usage tracking and quota enforcement in a distributed computing system |
| US20200027033A1 (en) * | 2018-07-19 | 2020-01-23 | Adobe Inc. | Updating Machine Learning Models On Edge Servers |
| CN110175283A (zh) * | 2019-05-10 | 2019-08-27 | 深圳前海微众银行股份有限公司 | 一种推荐模型的生成方法及装置 |
| CN110572253A (zh) * | 2019-09-16 | 2019-12-13 | 济南大学 | 一种联邦学习训练数据隐私性增强方法及系统 |
| CN112257874A (zh) * | 2020-11-13 | 2021-01-22 | 腾讯科技(深圳)有限公司 | 分布式机器学习系统的机器学习方法、装置、系统 |
| CN112686393A (zh) * | 2020-12-31 | 2021-04-20 | 华南理工大学 | 一种联邦学习系统 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4318338A4 |
Cited By (15)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115983409A (zh) * | 2022-11-11 | 2023-04-18 | 北京大学 | 基于差分隐私的联邦学习训练方法、装置、系统及设备 |
| CN115905978A (zh) * | 2022-11-18 | 2023-04-04 | 安徽工业大学 | 基于分层联邦学习的故障诊断方法及系统 |
| CN115775010B (zh) * | 2022-11-23 | 2024-03-19 | 国网江苏省电力有限公司信息通信分公司 | 基于横向联邦学习的电力数据共享方法 |
| CN115775010A (zh) * | 2022-11-23 | 2023-03-10 | 国网江苏省电力有限公司信息通信分公司 | 基于横向联邦学习的电力数据共享方法 |
| CN116384506A (zh) * | 2023-03-28 | 2023-07-04 | 支付宝(杭州)信息技术有限公司 | 一种模型训练的方法、装置、存储介质及电子设备 |
| WO2024263820A3 (en) * | 2023-06-20 | 2025-02-06 | Xeba Technologies, LLC | Object messaging and intelligent objects (omio) |
| CN117196010A (zh) * | 2023-08-03 | 2023-12-08 | 西安电子科技大学杭州研究院 | 一种面向网联汽车的区块链分布式联邦学习方法、系统及终端 |
| CN117035009A (zh) * | 2023-08-24 | 2023-11-10 | 重庆大学 | 一种基于数据集压缩的个性化通信方法 |
| CN119558423A (zh) * | 2023-09-01 | 2025-03-04 | 安徽工业大学 | 一种基于客户端目标增强的个性化联邦学习方法 |
| CN117390448A (zh) * | 2023-10-25 | 2024-01-12 | 西安交通大学 | 一种用于云际联邦学习的客户端模型聚合方法及相关系统 |
| CN117390448B (zh) * | 2023-10-25 | 2024-04-26 | 西安交通大学 | 一种用于云际联邦学习的客户端模型聚合方法及相关系统 |
| CN117436515B (zh) * | 2023-12-07 | 2024-03-12 | 四川警察学院 | 联邦学习方法、系统、装置以及存储介质 |
| CN117436515A (zh) * | 2023-12-07 | 2024-01-23 | 四川警察学院 | 联邦学习方法、系统、装置以及存储介质 |
| CN117472866B (zh) * | 2023-12-27 | 2024-03-19 | 齐鲁工业大学(山东省科学院) | 一种区块链监管与激励下的联邦学习数据共享方法 |
| CN117472866A (zh) * | 2023-12-27 | 2024-01-30 | 齐鲁工业大学(山东省科学院) | 一种区块链监管与激励下的联邦学习数据共享方法 |
Also Published As
| Publication number | Publication date |
|---|---|
| EP4318338A1 (en) | 2024-02-07 |
| US20240054354A1 (en) | 2024-02-15 |
| CN115249073B (zh) | 2026-03-24 |
| CN115249073A (zh) | 2022-10-28 |
| EP4318338A4 (en) | 2024-09-25 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2022228204A1 (zh) | 一种联邦学习方法以及装置 | |
| JP7383803B2 (ja) | 不均一モデルタイプおよびアーキテクチャを使用した連合学習 | |
| US10163420B2 (en) | System, apparatus and methods for adaptive data transport and optimization of application execution | |
| US11716271B2 (en) | Automated data flows using flow-based data processor blocks | |
| CN110390246A (zh) | 一种边云环境中的视频分析方法 | |
| CN116685985A (zh) | 具有多样化反馈的联合学习系统与方法 | |
| Rani et al. | Blockchain-based IoT enabled health monitoring system: P. Rani et al. | |
| CN112286666B (zh) | 基于回调机制的细粒度数据流可靠卸载方法 | |
| WO2020053887A1 (en) | Distributed training of systems for medical image analysis | |
| US20220156235A1 (en) | Automatic generation of labeled data in iot systems | |
| CN113537495A (zh) | 基于联邦学习的模型训练系统、方法、装置和计算机设备 | |
| US20220172054A1 (en) | Intermediate network node and method performed therein for handling data of communication networks | |
| US12045695B2 (en) | System and methods for privacy preserving cross-site federated learning | |
| Zhang et al. | Toy-IoT-oriented data-driven CDN performance evaluation model with deep learning | |
| Dunne et al. | A comparison of data streaming frameworks for anomaly detection in embedded systems | |
| Kaur et al. | Federated learning in IoT: A survey from a resource-constrained perspective | |
| WO2024041119A1 (zh) | 数据备份方法和装置 | |
| Rashid et al. | Edgestore: Towards an edge-based distributed storage system for emergency response | |
| KR20240114208A (ko) | 다층 건물에서 상황 인식 학습 모델의 성능 향상을 위한 군집과 연합 결합 학습 방법 및 시스템 | |
| Agarwal et al. | ANN-based scalable video encoding method for crime surveillance-intelligence of things applications | |
| CN116389175B (zh) | 流量数据检测方法、训练方法、系统、设备及介质 | |
| Latif et al. | Cloudlet federation based context-aware federated learning approach | |
| Challoob et al. | Enhancing the performance assessment of network-based and machine learning for module availability estimation | |
| CN116405516A (zh) | 一种基于云边协同的工业数据库管控系统及方法 | |
| Miao et al. | Data Exchange Mechanism for Real‐Time Object Detection in Cloud‐Edge IoT System |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22794675 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2022794675 Country of ref document: EP |
|
| ENP | Entry into the national phase |
Ref document number: 2022794675 Country of ref document: EP Effective date: 20231026 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 11202307990V Country of ref document: SG |