WO2024120470A1 - 模型训练方法、终端及网络侧设备 - Google Patents
模型训练方法、终端及网络侧设备 Download PDFInfo
- Publication number
- WO2024120470A1 WO2024120470A1 PCT/CN2023/136968 CN2023136968W WO2024120470A1 WO 2024120470 A1 WO2024120470 A1 WO 2024120470A1 CN 2023136968 W CN2023136968 W CN 2023136968W WO 2024120470 A1 WO2024120470 A1 WO 2024120470A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- federated learning
- model
- message
- information
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/16—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/04—Network management architectures or arrangements
- H04L41/042—Network management architectures or arrangements comprising distributed management centres cooperatively managing the network
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/2866—Architectures; Arrangements
- H04L67/2869—Terminals specially adapted for communication
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/098—Distributed learning, e.g. federated learning
Definitions
- the present application belongs to the field of communication technology, and specifically relates to a model training method, a terminal and a network side device.
- Federated learning aims to build a federated learning model based on distributed data sets.
- information related to the federated learning model can be exchanged between parties (or in encrypted form), but the original data cannot be exchanged, so as not to expose the private part of the data on each site.
- horizontal federated learning is the union of samples. It is suitable for scenarios where the participants have the same business model but reach different customers, that is, scenarios with more feature overlap and less user overlap, such as the same service (such as session management business) in the core domain and access domain of the communication network serving different users (such as each terminal, that is, different samples).
- horizontal federation increases the number of training samples, thereby obtaining a better federated learning model.
- the client After the federated learning training is completed, the client usually remains in a state of waiting for the next round of training, taking up a lot of space and computing power.
- the embodiments of the present application provide a model training method, a terminal, and a network-side device, which can solve the problem that the client cannot know the end of federated learning training and thus cannot perform the next operation, thereby occupying space and computing power.
- a model training method comprising: a first device receives a first message from a second device, the first message being used to indicate that federated learning training is terminated or suspended; the first device performs a first operation based on the first message; wherein the first device comprises a federated learning client, and the second device comprises a federated learning server.
- a model training method comprising: a second device sends a first message to a first device, wherein the first message is used to indicate that the federated learning training is terminated or aborted; wherein the first device includes a client of the federated learning
- the second device includes a federated learning server.
- a model training device which is applied to a first device, including: a receiving module for receiving a first message from a second device, wherein the first message is used to indicate the termination or suspension of federated learning training; a processing module for performing a first operation based on the first message; wherein the first device includes a federated learning client, and the second device includes a federated learning server.
- a model training device which is applied to a second device, including: a sending module, used to send a first message to a first device, wherein the first message is used to indicate the termination or suspension of federated learning training; wherein the first device includes a federated learning client, and the second device includes a federated learning server.
- a terminal comprising a processor and a communication interface, wherein the communication interface is used to receive a first message from a second device, the first message is used to indicate that federated learning training is terminated or suspended; the processor is used to perform a first operation based on the first message; wherein the terminal includes a client of federated learning, and the second device includes a server of federated learning.
- the communication interface is used to send a first message to a first device, the first message is used to indicate that federated learning training is terminated or suspended; wherein the first device includes a client of federated learning, and the terminal includes a server of federated learning.
- a network side device which includes a processor and a memory, wherein the memory stores programs or instructions that can be run on the processor, and when the program or instructions are executed by the processor, the steps of the method described in the first aspect or the second aspect are implemented.
- a network-side device comprising a processor and a communication interface, wherein the communication interface is used to receive a first message from a second device, the first message being used to indicate that federated learning training is terminated or aborted; the processor is used to perform a first operation based on the first message; wherein the network-side device comprises a client of federated learning, and the second device comprises a server of federated learning.
- the communication interface is used to send a first message to a first device, the first message being used to indicate that federated learning training is terminated or aborted; wherein the first device comprises a client of federated learning, and the network-side device comprises a server of federated learning.
- a model training system comprising: a terminal and a network side device, wherein the terminal can be used to execute the steps of the method described in the first aspect, and the network side device can be used to execute the steps of the method described in the second aspect; or, the terminal can be used to execute the steps of the method described in the second aspect, and the network side device can be used to execute the steps of the method described in the first aspect.
- a readable storage medium on which a program or instruction is stored.
- the program or instruction is executed by a processor, the steps of the method described in the first aspect are implemented, or the steps of the method described in the second aspect are implemented.
- a chip comprising a processor and a communication interface, the communication interface is coupled to the processor, the processor is used to run a program or instruction to implement the steps of the method described in the first aspect, or Implement the steps of the method described in the second aspect.
- a computer program/program product is provided, wherein the computer program/program product is stored in a storage medium, and the computer program/program product is executed by at least one processor to implement the steps of the method described in the first aspect, or to implement the steps of the method described in the second aspect.
- the server may send a first message to the client, and the first message is used to indicate that the federated learning training is terminated or suspended.
- the first device can be informed that the federated learning training is finished, and can perform a first operation based on the first message, for example, stopping the local federated learning training, deleting the local federated learning model, etc., to avoid occupying the client's space and computing power and improving the client's performance.
- FIG1 is a schematic diagram of a wireless communication system according to an embodiment of the present application.
- FIG2 is a schematic flow chart of a model training method according to an embodiment of the present application.
- FIG3 is a schematic flow chart of a model training method according to an embodiment of the present application.
- FIG4 is a schematic flow chart of a model training method according to an embodiment of the present application.
- FIG5 is a schematic diagram of the structure of a model training device according to an embodiment of the present application.
- FIG6 is a schematic diagram of the structure of a model training device according to an embodiment of the present application.
- FIG7 is a schematic diagram of the structure of a communication device according to an embodiment of the present application.
- FIG8 is a schematic diagram of the structure of a terminal according to an embodiment of the present application.
- FIG9 is a schematic diagram of the structure of a network side device according to an embodiment of the present application.
- FIG. 10 is a schematic diagram of the structure of a network side device according to an embodiment of the present application.
- first, second, etc. in the specification and claims of the present application are used to distinguish similar objects, and are not used to describe a specific order or sequence. It should be understood that the terms used in this way are interchangeable under appropriate circumstances, so that the embodiments of the present application can be implemented in an order other than those illustrated or described here, and the objects distinguished by “first” and “second” are generally of the same type, and the number of objects is not limited.
- the first object can be one or more.
- “and/or” in the specification and claims represents at least one of the connected objects, and the character “/" generally represents that the objects associated with each other are in an "or” relationship.
- LTE Long Term Evolution
- LTE-A Long Term Evolution-Advanced
- CDMA Code Division Multiple Access
- TDMA Time Division Multiple Access
- FDMA Frequency Division Multiple Access
- OFDMA Orthogonal Frequency Division Multiple Access
- SC-FDMA Single-carrier Frequency Division Multiple Access
- NR New Radio
- 6G 6th Generation
- FIG1 shows a block diagram of a wireless communication system applicable to an embodiment of the present application.
- the wireless communication system includes a terminal 11 and a network side device 12 .
- the terminal 11 can be a mobile phone, a tablet computer (Tablet Personal Computer), a laptop computer (Laptop Computer) or a notebook computer, a personal digital assistant (Personal Digital Assistant, PDA), a handheld computer, a netbook, an ultra-mobile personal computer (ultra-mobile personal computer, UMPC), a mobile Internet device (Mobile Internet Device, MID), augmented reality (augmented reality, AR)/virtual reality (virtual reality, VR) equipment, a robot, a wearable device (Wearable Device), a vehicle-mounted device (Vehicle User Equipment, VUE), a pedestrian terminal (Pedestrian User Equipment, PUE), a smart home (home equipment with wireless communication functions, such as refrigerators, televisions, washing machines or furniture, etc.), a game console, a personal computer (personal computer, PC), a teller
- the network side device 12 may include an access network device or a core network device, wherein the access network device may also be referred to as a radio access network device, a radio access network (RAN), a radio access network function or a radio access network unit.
- the access network device may include a base station, a WLAN access point or a WiFi node, etc.
- the base station may be referred to as a node B, an evolved node B (evolved Node B, eNB), an access point, a base transceiver station (Base Transceiver Station, BTS), a radio base station, a radio transceiver, a basic service set (Basic Service Set, BSS), an extended service set (Extended Service Set, ESS), a home B node, a home evolved B node, a transmitting and receiving point (Transmitting Receiving Point, TRP) or other appropriate terms in the field, as long as the same technical effect is achieved, the base station is not limited to a specific technical vocabulary, it should be noted that in the embodiment of the present application, only the base station in the NR system is used as an example for introduction, and the specific type of the base station is not limited.
- the core network equipment may include but is not limited to at least one of the following: core network node, core network function, mobility management entity (Mobility Management Entity, MME), access mobility management function (Access and Mobility Management Function, AMF), session management function (Session Management Function, SMF), user plane function (User Plane Function, UPF), policy control function (Policy Control Function, PCF), policy and charging rules function unit (Policy and Charging Rules Function, PCRF), edge application service discovery function (Edge Application Server Discovery Function, EASDF), unified data management (Unified Data Management, UDM), unified data storage (Unified Data Repository, UDR), home user server (Home Subscriber Server, HSS), Centralized network configuration (CNC), network repository function (NRF), network exposure function (NEF), local NEF (L-NEF), binding support function (BSF), application function (AF), etc.
- MME mobility management entity
- AMF Access Mobility Management Function
- SMF Session Management Function
- SMF session management function
- User Plane Function User
- an embodiment of the present application provides a model training method 200, which can be executed by a first device.
- the method can be executed by software or hardware installed on the first device.
- the method includes the following steps.
- the first device receives a first message from the second device, where the first message is used to indicate that federated learning training is terminated or stopped.
- the first device in each embodiment of the present application may be a client of federated learning, which may be a terminal, an access network device or a core network device, etc.
- the core network device may include, for example, a model training logical function network element (Model Training Logical Function, MTLF), an analysis logical function network element (Analytics Logical Function, AnLF), etc.
- the second device may be a server of federated learning, which may be a terminal, an access network device or a core network device, and the core network device may include, for example, MTLF, AnLF, etc.
- a first device may receive a first message from a second device, the first message being used to indicate that federated learning training is terminated or aborted.
- Termination of federated learning training means that the entire federated learning process ends for the first device and the second device.
- Termination of federated learning training means that the federated learning process is interrupted or ended for the first device.
- S204 The first device performs a first operation based on the first message.
- the first device may perform the first operation according to internal logic, and may also perform the first operation according to suggestion information in the first message, etc.
- the suggestion information will be described in detail later.
- the server i.e., the second device, performs a member selection process.
- the second device sends a request to a storage information device such as a network storage function (NF Repository Function, NRF), requesting to obtain the capability information of each intelligent network element device such as MTLF, and determines whether the intelligent network element device can participate in federated learning through the capability information of the intelligent network element device, and determines the members for the federated learning; 2)
- the second device sends information such as the initialization model of the federated learning to each client, i.e., the first device; 3) Each first device feeds back the intermediate results, such as gradients, to the second device after local training; 4)
- the second device aggregates the intermediate results and updates the federated learning model. After repeated steps of member selection-intermediate model distribution-local training-intermediate result feedback-aggregation and updating of the global model, the training can be stopped after the federated learning model converges.
- the server can send a first message to the client, and the first message is used to indicate the termination or suspension of the federated learning training.
- the first device can be informed that the federated learning training is ended, and can perform a first operation based on the first message, for example, stopping the local federated learning training, deleting the local federated learning model, etc., to avoid occupying the client's space and computing power and improve the client's performance.
- the embodiment of the present application defines a corresponding processing mechanism after the federated learning training is completed, so as to make the execution process of the entire federated learning more complete.
- the first message may include at least one of the following:
- Indication information of the termination of federated learning training that is, the second device explicitly indicates the termination of federated learning training, so that the first device can perform the first operation according to its internal logic or the recommended information in the following 7); the information from 2) to 7) below can implicitly indicate the termination of federated learning training; or, the termination of federated learning training can be implicitly indicated through a signaling name, etc.
- the termination of federated learning training mentioned in the various embodiments of the present application may refer to the completion of federated learning training, for example, the parameters of the federated learning model converge, the loss function of the federated learning model converges, the number of federated learning training reaches a number threshold, the duration of the federated learning training reaches a duration threshold, etc.
- Federated learning training termination indication information that is, the second device explicitly indicates the termination of federated learning training, so that the first device can perform the first operation according to its internal logic or the recommended information in the following 7); the following information 2) to 7) can implicitly indicate the termination of federated learning training; or, implicitly indicate the termination of federated learning training through a signaling name, etc.
- Model ID or identification information of the federated learning model which can be used to uniquely identify the federated learning model.
- the federated learning model can be a trained federated learning model or a model in which training is terminated during training.
- Model information of the federated learning model which includes, for example, the network structure, weight parameters, input and output data of the federated learning model; the model information may also include download address information or storage address information of the federated learning model file.
- the information of input and output data may be the category information of input data, which is used to indicate what kind of data should be input and what kind of output data should be output.
- the federated learning model may be a trained federated learning model or a model in which training is terminated during training.
- Gradient information of the federated learning model which can be transmitted in the form of a gradient file, such as the download address information or storage address information of the gradient file, or can be transmitted through the message, etc.
- the gradient information can be the gradient information used by the final global model, wherein the gradient information of the final global model can be the sum of the gradients fed back by multiple clients in this round (because the update of a round of global model can be based on multiple gradients fed back by multiple clients in this round, these gradients can be aggregated and then updated, or all gradients can be used for update, etc.
- the feedback gradient information can be a sum of these multiple gradients, or multiple gradient information, etc.).
- the federated learning model can be a federated learning model that has completed training, or it can be a model when training is terminated during training.
- Task identification information which is used to indicate the task category for which the federated learning model is used, for example, indicating which type of task the federated learning model is used to perform.
- Task identification information and the analysis identification described below have similar meanings and can be used interchangeably; task identification information can also be called data analysis task identification (which can be analytic ID) information.
- Task association identification information (which can be correlation ID, subscription correlation ID), which is used to indicate the target federated learning task, for example, uniquely indicating the federated learning task (or federated learning model training task). This information can be generated when the task is generated, or it can be generated by the server at Generated when issuing global tasks, etc.
- the reason information is used to indicate the reason why the second device sends the first message.
- the reason information can be used to indicate at least one of the following: the federated learning process has ended, the federated learning process is interrupted.
- the reason information can further indicate the reason why the federated learning is interrupted, such as the accuracy of the second device is not enough to continue the federated learning, or the second device is removed.
- the reason information can further indicate the reason why the federated learning is terminated, such as the federated learning model has converged, the number of iterations has reached a preset value, the training time has timed out, etc.
- Recommendation information where the recommendation information is used to indicate an operation to be performed by the first device after receiving the first message.
- the suggestion information may include at least one of the following:
- Instruction information for updating the federated learning model which is used to instruct the first device to update its local federated learning model using the received gradient information, etc., and can implicitly inform the first device that it can save and use the federated learning model (e.g., it has the authority to use the federated learning model).
- the federated learning model can be a federated learning model that has been trained, or a federated learning model when the training is terminated during the training process.
- c Instruction information for deleting the local federated learning model, used to indicate that the first device is to delete its local federated learning model, for example, indicating that the first device should not use the federated learning model, does not have the authority to use the federated learning model, etc.
- the first device may perform the first operation according to internal logic, and may also perform the first operation according to suggestion information in the first message, etc.
- the first operation performed by the first device includes at least one of the following:
- the first device can obtain the trained federated learning model or gradient information and use the trained federated learning model or gradient information to update the local federated learning model, and use the model subsequently.
- the first device can obtain the trained federated learning model and use the trained federated learning model.
- the first device if the first device does not need the trained federated learning model, the first device also knows that the local federated learning model does not need to be updated anymore, and can delete the local federated learning model, thereby saving storage space.
- the first device may stop local federated learning training to save computing power, etc.
- the first operation performed by the first device may include at least one of the above 1) to 4), for example, the first device stops the local federated learning training, deletes the local federated learning model used in the previous local federated learning training, and
- the federated learning model may be a federated learning model that has been trained, or a federated learning model that has been terminated during training.
- the first device may also determine the first operation based on its internal logic and/or the first message. Specifically, when the first device receives the first message, it may determine the first operation based on the suggestion information in the first message, such as performing the first operation as suggested. Alternatively, when the first device receives the first message, it may determine the first operation based on the model information or gradient information in the first message, such as receiving a federated learning model, or updating a local model. Alternatively, when the first device receives the first message, it may determine the first operation based on the task identification information in the first message, such as using a trained federated learning model to perform a task.
- the first operation includes updating the local federated learning model, and/or receiving the federated learning model, and after the first device receives the first message from the second device, the method further includes: the first device saving the federated learning model; wherein the federated learning model supports use by the first device.
- saving the federated learning model may mean that the first device saves the federated learning model to the first device after updating the local federated learning model or after the first device receives the federated learning model.
- the federated learning model supports use by the first device, which may mean that when other devices initiate model requests or task requests (such as a data analysis task) to the first device, the first device can use the federated learning model as the target model of the model request to feed back to other devices, or use the model to perform operations such as calculations and reasoning to generate task results corresponding to the task request, and feed back the results to other devices, etc.
- the federated learning model can be a federated learning model that has been trained, or a federated learning model when training is terminated during training.
- the method before the first device receives the first message from the second device in S202, the method also includes: the first device receives a federated learning training request message from the second device; the first device sends a response message to the second device, and the response message includes request information for obtaining a federated learning model.
- the request information may be a request to obtain model information or gradient information of the federated learning model, etc., and is used to request the second device to send a global or aggregated federated learning model to the first device.
- the response message also includes a task association identifier, which is used to uniquely identify this model training task.
- the training request message is used to request the first device to participate in federated learning.
- the training request message includes at least one of task identification information, task association identification information, model information, model gradient information, model identification information, etc.
- the training request message may be to instruct the first device to use the model corresponding to the federated learning training and the data that the first device can collect to perform local federated learning training.
- the request information may include at least one of the following:
- the federated learning model may be a federated learning model after training is completed, or a federated learning model obtained by training when the federated learning is interrupted and stopped, and the subsequent steps are similar.
- Second request information where the second request information is used to request model information of the federated learning model (the model information includes network architecture information, download address information, etc.).
- Third request information where the third request information is used to request gradient information of the federated learning model.
- the federated learning model may be a federated learning model that has completed training, or a federated learning model in which training is terminated during training.
- the method before the first device receives the first message from the second device in S202, the method further includes: the first device sends a second message to the second device after completing the local federated learning training, and the second message includes the training result of the local training of the federated learning, and the request information for obtaining the federated learning model.
- the training result can be the intermediate model information or intermediate gradient information of the federated learning model.
- the second message can also include information such as the federated learning task identifier and the identifier of the federated learning model.
- the second message may be generated by the first device after any round of federated learning training is completed.
- the request information included in the second message may include at least one of the following:
- First request information where the first request information is used to request to obtain a federated learning model.
- Second request information where the second request information is used to request model information of the federated learning model (the model information includes network architecture information, download address information, etc.).
- Third request information where the third request information is used to request gradient information of the federated learning model.
- the embodiment includes the following steps.
- Step 0 This step 0 can be divided into the following steps 0a and 0b.
- Step 0a The federated learning consumer (such as AnLF) sends a federated learning model request to the federated learning server (such as MTLF).
- the federated learning model request can be carried by the Nnwdaf_MLModelProvision_Subscribe message.
- the federated learning model request is used to request a federated learning model (hereinafter referred to as the model) to complete its own tasks.
- the server determines whether to trigger federated learning based on local configuration or the request of the federated learning consumer, and determines to initialize federated learning and member selection.
- Step 0b For devices without federated learning server capabilities (such as devices with only client capabilities, or devices without federated learning capabilities), a federated learning request can also be sent to a device with federated learning server capabilities, requesting federated learning to generate the required model.
- devices without federated learning server capabilities such as devices with only client capabilities, or devices without federated learning capabilities
- the device can also be regarded as requesting to obtain the trained federated learning model by sending a federated learning request to the server.
- the device can also participate in the federated learning process, so the server can also send the first message to the device.
- the federated learning request in this step may include at least one of the following information:
- Analytics ID which is used to indicate the federated learning process for the task type of analytics ID. It can also be called data analysis task ID. It is the same as the previous task ID information.
- Model ID The identifier of the federated learning model (Model ID), which is used to uniquely identify the federated learning model.
- Model filter information used to limit the scope of the federated learning process, such as regional scope, time range, and single network slice selection auxiliary information. Assistance Information, S-NSSAI), Data Network Name (Data Network Name, DNN), etc.
- Model target of model (optional), which can be used to specify the target of the federated learning process, such as one or more specific terminals, all terminals within a certain range, or all terminals that meet certain conditions.
- Model reporting information (optional), which can be used to indicate the reporting information of the generated federated learning model information, such as reporting time (start time, deadline, etc.) and reporting conditions (periodic trigger, event trigger, etc.).
- Step 1 The device with federated learning server capabilities determines that federated learning is to be performed and selects members. It can initialize the formulation of a strategy for federated learning, such as: specifying how many rounds of training to collect status information, or how many rounds of training to collect training status information.
- the server finds out the devices that are willing to participate in federated learning and meet the requirements of federated learning by searching the capability information and willingness information of other devices. For example, when the server is a MTLF network element, the MTLF network element searches for network elements in the NRF to find out other network elements (such as other MTLFs) that meet the training requirements of this federated learning.
- the server is a MTLF network element
- the MTLF network element searches for network elements in the NRF to find out other network elements (such as other MTLFs) that meet the training requirements of this federated learning.
- Step 2 A device with federated learning server capabilities (referred to as server) and a device with federated learning client capabilities (referred to as client) interact to perform federated learning training, which may specifically include the following steps.
- server a device with federated learning server capabilities
- client a device with federated learning client capabilities
- Step 2a The server sends a training request for federated learning training to the client, such as through a Nnwdaf_MLModelTraining_Subscribe message, requesting the client to participate in federated learning and perform local training of federated learning based on the global model and the client's local data.
- the training request may include at least one of the following information:
- Analytics ID which is used to indicate the type of task requested for the analytics ID and the type of task the federated learning model is used for.
- Model ID The identifier of the federated learning model (Model ID), which is used to uniquely identify the federated learning model.
- Task correlation identification information (Correlation ID), which is used to uniquely indicate the federated learning task.
- Model initialization information which is used to indicate model information and configuration information in this round of federated learning.
- describing the model means describing the model itself, such as what algorithm, architecture, parameters and hyperparameters the federated learning model is composed of, or the federated learning model itself, such as the model file, the address information of the model file, etc.
- the configuration information in this round of federated learning (also called guideline information) can be used to determine how to perform training in the local training process of this round of federated learning, such as the number of rounds of local training to be performed, the type of data to be used, the maximum training time, and other information.
- Step 2b After receiving the training request for federated learning training, the client can feedback to the server whether it participates in the federated learning training, such as sending it through the Nnwdaf_MLModelTraining_Subscribe Response message.
- the relevant information may include: whether the client participates in the federated learning training, the indication information of the request to obtain the final global model (global model) or the updated global model, the analysis identifier, the task association identifier information, etc.
- the updated global model refers to the aggregate model generated by the federated learning server when the federated learning is interrupted.
- the request to obtain the final global model or the indication information of the updated global model is used to indicate the client
- the client wants to obtain the final global model or updated global model information, such as the model file of the final model or updated global model, or the download address information or storage address information of the model file, or the updated gradient of the federated learning model.
- the updated global model information can help the client generate and obtain the final global model information.
- Step 3 During each iteration, the server sends model information and/or model update messages to the client.
- the updated global model or the updated gradient information of the global model can be sent in a manner such as step 2a, or different signaling can be used to inform the client, and the client can update its local model for the next round of local training. It can include identification information such as task association identification information for indicating federated training, model information, and/or gradient information.
- Step 4 The client sends a request to obtain data to the data source (data source, referring to the network element that can provide data) in its area or to which it belongs to collect data for local federated learning.
- data source data source, referring to the network element that can provide data
- the data is provided by different network elements, such as User Plane Function (UPF), Operation Administration and Maintenance (OAM), Unified Data Management (UDM), etc.
- UPF User Plane Function
- OAM Operation Administration and Maintenance
- UDM Unified Data Management
- the request to obtain data can be carried by the following messages: Ndccf_DataManagement_Subscribe message, Nnf_EventExposure message, Subscribe and/or Ndccf_DataManagement_Notify/Nnf_EventExposure_Notify message, etc.
- the client uses the acquired data and model information to train the local model, generate intermediate results, and feed them back to the server for the server to aggregate and update the global model.
- Step 5 After the client completes local training, it feeds back the training results of the local training to the server. The server can then use the training results to update the global model. In this step, interim model information or gradient information can be fed back through Nnwdaf_MLModelTraining_Notify.
- the message sent by the client to the server may include at least one of the following information:
- Result information which is used to indicate the training result of the local training, which can be an intermediate model or an updated gradient, etc.
- Consent information Consent info
- status information status information
- training information acceleration info
- the willingness information is used to indicate whether the member is still willing to participate in the next round of federated learning.
- Status information is used to describe the client status information after the local training of this round of federated learning is completed.
- Specific status information can be member load (such as NF load); member resource usage (such as resource usage: CPU, memory, disk; GPU); member capability information (such as whether it can participate in federated learning, what kind of federated learning to participate in, etc.).
- Training status information used to describe the client's training status information during local training of this round of federated learning.
- the situation refers to the performance of the model after the local training based on its local data. It can be a statistical calculation method and the corresponding value of the method, such as the accuracy of the model and its specific value (80%), MAE and its value (0.1).
- Step 6 The server aggregates the model and determines that the training of the model can be stopped, or the server determines that the model training is terminated based on the training termination condition.
- the server decides to stop model training.
- the training end condition includes at least one of the following: all model parameters converge, the model loss function converges, the parameters of the model trained locally by the client converge, the loss function of the model trained locally by the client converges, the number of training rounds reaches the round number threshold, the number of training times reaches the number threshold, and the training duration reaches the duration threshold.
- These thresholds and convergence conditions can be pre-designed by the server internally, etc.
- the server decides to terminate the model training.
- the training termination conditions may include one or more of the following: reduced computing power of the client, excessive client load, excessive resource usage, etc.
- Step 7 The server sends a termination message to the client, telling the client participating in the federated learning that the federated learning has been terminated; or the server sends a termination message to the client, telling the client participating in the federated learning that the federated learning has been terminated.
- the server can send an indication information end of training termination or suspension to the client through the Nnwdaf_MLModelTraining_unsubscribe message, or send an indication information end of training termination or suspension to the client through other signaling messages; it can also send the final result of the federated learning training, such as the final model, or the final gradient, etc.; it can also send suggestion information to inform the client what actions can be performed on the federated learning.
- model information can be determined by the internal logic of the server, or it can be requested by the client during the interaction with federated learning, such as step 2b, step 5, etc.; it can also be received in step 0b.
- a message requesting to obtain a federated learning model Specifically, in step 2b and/or step 5, a request message from a device such as a client is received. For example, in step 2b, the first device sends a response message to the second device, and the response message includes request information for obtaining a trained federated learning model. In step 5, the first device sends a second message to the second device after completing local federated learning training.
- the second message includes the training results of local training of federated learning and the request information for obtaining a trained federated learning model.
- the server can also determine the content carried by the termination message before sending the termination message. Specifically, the server can determine the termination message when receiving a request message from the client, such as when the request message includes request model information, determining that the termination message carries the model information of the federated learning model; and when the request message includes request model gradient information, determining that the termination message carries the gradient information of the federated learning model.
- the termination (or suspension) message sent by the server to the client may include at least one of the following:
- Task identification information where the task identification information is used to indicate the task category that the federated learning model is used for, for example, indicating which type of task the federated learning model is used for.
- Model ID The model identifier (Model ID) or identification information of the federated learning model, which can be used to uniquely identify the federated learning model.
- Task association identification information where the task association identification information is used to indicate the target federated learning task, for example, to uniquely indicate the federated learning task.
- Indication information of the termination (or suspension) of the federated learning training i.e., the second device explicitly indicates the termination of the federated learning training. Stop (or terminate), so that the first device can perform the first operation according to its internal logic or the following suggested information.
- Model information of the federated learning model which includes, for example, the network structure, weight parameters, input and output data of the federated learning model; the model information may also include download address information or storage address information of the federated learning model file.
- Gradient information of the federated learning model which may be transmitted in the form of a gradient file, and the gradient information may be the gradient information used by the final global model.
- the server can provide the client with the final model information, or the gradient information of the final update.
- the client can use the gradient information to update its local federated learning model (the federated learning model used in the previous federated learning) to obtain the final global model.
- the final global model refers to the aggregated model generated by the federated learning server after the federated learning process is completed.
- the final global model can be at least one of the following: a model file (containing the model's network structure, weight parameters, input and output data, etc.); download address information or storage address information of the model file (used to indicate the storage address of the model file, or where the model file can be downloaded from).
- a model file containing the model's network structure, weight parameters, input and output data, etc.
- download address information or storage address information of the model file used to indicate the storage address of the model file, or where the model file can be downloaded from.
- the gradient information can be delivered in the form of a gradient file, which contains the gradient information used for model update.
- Reason information where the reason information is used to indicate the reason why the server sends the termination (or suspension) message.
- Suggestion information where the suggestion information is used to indicate an operation to be performed by the first device after receiving the first message.
- the suggestion information may include at least one of the following:
- Instruction information for updating the federated learning model which is used to instruct the first device to update its local federated learning model using the received gradient information, etc., and can implicitly inform the first device that it can save and use the federated learning model (e.g., it has the authority to use the federated learning model).
- c Instruction information for deleting the local federated learning model, used to indicate that the first device is to delete its local federated learning model, for example, indicating that the first device should not use the federated learning model, does not have the authority to use the federated learning model, etc.
- Steps 8a-8c After receiving the termination message, the client performs an action. Specifically, after knowing that the federated learning training is finished, the client can decide the subsequent actions based on its internal logic, or the recommended information and model information sent by the server. The subsequent actions can be to update the local model to the global model, receive the final global model and use it later; delete the local model used in the previous training; stop training, etc.
- the client updates the local model to obtain the final model, and can use the final model later.
- the client receives the gradient information of the model update (as in step 7), and uses the gradient information to update the local model, thereby obtaining the final model.
- the model can be used later, such as sending the model to other devices for certain data analysis tasks.
- the client saves the final model and can use the final model later.
- the client receives the final model, such as in step 7, receiving the model file and/or the download address information or storage address information of the model file, to obtain the final model.
- the client deletes the local model. Specifically, the client deletes the local model trained by the federated learning. This may be because the client did not initiate a request before, and therefore did not receive the final model of the federated learning. It may also be because the model will not be used in the future, so the client chooses to delete the model. If the client still wants to obtain the final model of the federated learning training, it can re-initiate a normal model acquisition request to the server, such as sending it through Nnwdaf_MLModelProvision_Subscribe and Nnwdaf_MLModelProvision_Notify messages.
- Step 9 After the server completes the federated learning model training, it sends the model information to the consumer. This step has no sequence relationship with steps 7-8, that is, this step can also occur before step 7.
- the model information may include at least one of the following:
- Model file including the model’s network structure, weight parameters, input and output data, etc.
- Analytics ID which is used to indicate that the federated learning model is suitable for a certain type of reasoning task.
- Model filter information which is used to indicate the reporting information of the generated federated learning model information, such as reporting time (start time, deadline, etc.) and reporting conditions (periodic trigger, event trigger, etc.).
- Valid region information indicating the region to which the federated learning model is applicable.
- Valid time information indicating the time when the federated learning model is applicable.
- the server can send model information via the following messages: Nnwdaf_MLModelProvision_Notify or Nnwdaf_MLModelInfo_Response.
- Fig. 4 is a schematic diagram of a flow chart of a model training method according to an embodiment of the present application, which can be applied to a second device. As shown in Fig. 4, the method 400 includes the following steps.
- the second device sends a first message to the first device, where the first message is used to indicate that the federated learning training is terminated. Or terminate; wherein the first device includes a client of federated learning, and the second device includes a server of federated learning.
- the server may send a first message to the client, where the first message is used to indicate that the federated learning training is terminated or suspended.
- the first device may be informed that the federated learning training is finished, and may also perform a first operation based on the first message, such as stopping the local federated learning training, deleting the local federated learning model, etc., to avoid occupying the client's space and computing power and improving the client's performance.
- the first message includes at least one of the following:
- Task identification information where the task identification information is used to indicate the task category for which the federated learning model is used.
- Task association identification information where the task association identification information is used to indicate a target federated learning task.
- Reason information where the reason information is used to indicate the reason why the second device sends the first message.
- the reason information is used to indicate at least one of the following: the federated learning process ends; the federated learning process is interrupted.
- Recommendation information where the recommendation information is used to indicate an operation to be performed by the first device after receiving the first message.
- the recommendation information is used to instruct the first device to perform at least one of the following after receiving the first message: 1) updating the local federated learning model; 2) receiving the federated learning model; 3) deleting the local federated learning model used in the previous local federated learning training; 4) stopping the local federated learning training.
- the method before the second device sends the first message to the first device, the method also includes: the second device sends a federated learning training request message to the first device; the second device receives a response message from the first device, and the response message includes request information for obtaining a federated learning model.
- the method before the second device sends the first message to the first device, the method also includes: the second device receives a second message from the first device, the second message including the training results of local training of federated learning of the first device, and request information for obtaining the federated learning model.
- the request information includes at least one of the following: 1) first request information, the first request information is used to request to obtain a federated learning model; 2) second request information, the second request information is used to request to obtain model information of the federated learning model; 3) third request information, the third request information is used to request to obtain gradient information of the federated learning model.
- the second device after receiving the request information for obtaining the federated learning model, the second device sends the first message according to the request information for obtaining the federated learning model, wherein the first message includes at least one of the following: model information of the federated learning model and gradient information of the federated learning model.
- the learned model includes the final global model or the updated global model.
- the model training method provided in the embodiment of the present application can be executed by a model training device.
- the model training device executing the model training method is taken as an example to illustrate the model training device provided in the embodiment of the present application.
- FIG5 is a schematic diagram of the structure of a model training apparatus according to an embodiment of the present application.
- the apparatus can be applied to a first device.
- the apparatus 500 includes the following modules.
- the receiving module 502 may be configured to receive a first message from a second device, wherein the first message is configured to indicate that the federated learning training is terminated or stopped.
- the processing module 504 can be used to perform a first operation based on the first message; wherein the first device includes a federated learning client, and the second device includes a federated learning server.
- the server may send a first message to the client, and the first message is used to indicate that the federated learning training is terminated or suspended.
- the first device can be informed that the federated learning training is finished, and can perform a first operation based on the first message, for example, stopping the local federated learning training, deleting the local federated learning model, etc., to avoid occupying the client's space and computing power and improving the client's performance.
- the first message includes at least one of the following:
- Task identification information where the task identification information is used to indicate the task category for which the federated learning model is used.
- Task association identification information where the task association identification information is used to indicate a target federated learning task.
- Reason information where the reason information is used to indicate the reason why the second device sends the first message.
- Recommendation information where the recommendation information is used to indicate an operation to be performed by the first device after receiving the first message.
- the first operation includes at least one of the following:
- the first operation includes updating the local federated learning model, and/or receiving the federated learning model, and the processing module 504 is also used to save the federated learning model; wherein, the federated learning model supports use by the first device.
- the receiving module 502 is also used to receive a federated learning training request message from the second device; the device also includes a sending module, used to send a response message to the second device, and the response message includes request information for obtaining a federated learning model.
- the apparatus further includes a sending module for sending a second message to the second device after completing the local federated learning training, wherein the second message includes the training results of the local federated learning training and request information for obtaining the federated learning model.
- the request information includes at least one of the following: 1) first request information, the first request information is used to request to obtain a federated learning model; 2) second request information, the second request information is used to request to obtain model information of the federated learning model; 3) third request information, the third request information is used to request to obtain gradient information of the federated learning model.
- the process of the method 200 corresponding to the embodiment of the present application can be referred to, and the various units/modules in the device 500 and the above-mentioned other operations and/or functions are respectively for implementing the corresponding processes in the method 200, and can achieve the same or equivalent technical effects. For the sake of brevity, they will not be repeated here.
- the model training device in the embodiment of the present application can be an electronic device, such as an electronic device with an operating system, or a component in an electronic device, such as an integrated circuit or a chip.
- the electronic device can be a terminal, or it can be other devices other than a terminal.
- the terminal can include but is not limited to the types of terminals 11 listed above, and other devices can be servers, network attached storage (NAS), etc., which are not specifically limited in the embodiment of the present application.
- FIG6 is a schematic diagram of the structure of a model training apparatus according to an embodiment of the present application.
- the apparatus can be applied to a second device.
- the apparatus 600 includes the following modules.
- the sending module 602 can be used to send a first message to a first device, where the first message is used to indicate that the federated learning training is terminated or suspended; wherein the first device includes a federated learning client, and the second device includes a federated learning server.
- the apparatus 600 may include a processing module and the like.
- the server may send a first message to the client, and the first message is used to indicate that the federated learning training is terminated or suspended.
- the first device can be informed that the federated learning training is finished, and can perform a first operation based on the first message, for example, stopping the local federated learning training, deleting the local federated learning model, etc., to avoid occupying the client's space and computing power and improving the client's performance.
- the first message includes at least one of the following:
- Task identification information where the task identification information is used to indicate the task category for which the federated learning model is used.
- Task association identification information where the task association identification information is used to indicate a target federated learning task.
- Reason information where the reason information is used to indicate the reason why the second device sends the first message.
- suggestion information is used to instruct the first device to perform after receiving the first message operate.
- the sending module 602 is also used to send a federated learning training request message to the first device; the device also includes a receiving module, used to receive a response message from the first device, and the response message includes request information for obtaining a federated learning model.
- the apparatus further includes a receiving module for receiving a second message from the first device, wherein the second message includes a training result of local training of federated learning of the first device and request information for obtaining a federated learning model.
- the request information includes at least one of the following: 1) first request information, the first request information is used to request to obtain a federated learning model; 2) second request information, the second request information is used to request to obtain model information of the federated learning model; 3) third request information, the third request information is used to request to obtain gradient information of the federated learning model.
- the process of the method 400 corresponding to the embodiment of the present application can be referred to, and the various units/modules in the device 600 and the above-mentioned other operations and/or functions are respectively for implementing the corresponding processes in the method 400, and can achieve the same or equivalent technical effects. For the sake of brevity, they will not be repeated here.
- the model training device provided in the embodiment of the present application can implement the various processes implemented by the method embodiments of Figures 2 to 4 and achieve the same technical effects. To avoid repetition, they will not be described here.
- an embodiment of the present application further provides a communication device 700, including a processor 701 and a memory 702, wherein the memory 702 stores a program or instruction that can be run on the processor 701.
- the communication device 700 is a terminal
- the program or instruction is executed by the processor 701 to implement the various steps of the above-mentioned model training method embodiment, and can achieve the same technical effect.
- the communication device 700 is a network side device
- the program or instruction is executed by the processor 701 to implement the various steps of the above-mentioned model training method embodiment, and can achieve the same technical effect. To avoid repetition, it will not be repeated here.
- the embodiment of the present application also provides a terminal, including a processor and a communication interface, wherein the communication interface is used to receive a first message from a second device, wherein the first message is used to indicate that the federated learning training is terminated or suspended; the processor is used to perform a first operation based on the first message; wherein the terminal includes a client of the federated learning, and the second device includes a server of the federated learning.
- the communication interface is used to send a first message to the first device, wherein the first message is used to indicate that the federated learning training is terminated or suspended; wherein the first device includes a client of the federated learning, and the terminal includes a server of the federated learning.
- the terminal embodiment corresponds to the above-mentioned terminal side method embodiment, and each implementation process and implementation mode of the above-mentioned method embodiment can be applied to the terminal embodiment and can achieve the same technical effect.
- Figure 8 is a schematic diagram of the hardware structure of a terminal implementing the embodiment of the present application.
- the terminal 800 includes but is not limited to: a radio frequency unit 801, a network module 802, an audio output unit 803, an input unit 804, a sensor 805, a display unit 806, a user input unit 807, an interface unit 808, a memory 809 and at least some of the components of a processor 810.
- the terminal 800 may also include a power source (such as a battery) for supplying power to various components.
- the power supply can be logically connected to the processor 810 through the power management system, so that the power management system can manage charging, discharging, power consumption and other functions.
- the terminal structure shown in FIG8 does not constitute a limitation on the terminal.
- the terminal may include more or fewer components than shown in the figure, or combine certain components, or arrange the components differently, which will not be described in detail here.
- the input unit 804 may include a graphics processing unit (GPU) 8041 and a microphone 8042, and the GPU 8041 processes the image data of the static picture or video obtained by the image capture device (such as a camera) in the video capture mode or the image capture mode.
- the display unit 806 may include a display panel 8061, and the display panel 8061 may be configured in the form of a liquid crystal display, an organic light emitting diode, etc.
- the user input unit 807 includes a touch panel 8071 and at least one of other input devices 8072.
- the touch panel 8071 is also called a touch screen.
- the touch panel 8071 may include two parts: a touch detection device and a touch controller.
- Other input devices 8072 may include, but are not limited to, a physical keyboard, function keys (such as a volume control key, a switch key, etc.), a trackball, a mouse, and a joystick, which will not be repeated here.
- the radio frequency unit 801 after receiving downlink data from the network side device, can transmit the data to the processor 810 for processing; in addition, the radio frequency unit 801 can send uplink data to the network side device.
- the radio frequency unit 801 includes but is not limited to an antenna, an amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, etc.
- the memory 809 can be used to store software programs or instructions and various data.
- the memory 809 may mainly include a first storage area for storing programs or instructions and a second storage area for storing data, wherein the first storage area may store an operating system, an application program or instruction required for at least one function (such as a sound playback function, an image playback function, etc.), etc.
- the memory 809 may include a volatile memory or a non-volatile memory, or the memory 809 may include both volatile and non-volatile memories.
- the non-volatile memory may be a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), or a flash memory.
- the volatile memory may be a random access memory (RAM), a static random access memory (SRAM), a dynamic random access memory (DRAM), a synchronous dynamic random access memory (SDRAM), a double data rate synchronous dynamic random access memory (DDRSDRAM), an enhanced synchronous dynamic random access memory (ESDRAM), a synchronous link dynamic random access memory (SLDRAM) and a direct memory bus random access memory (DRRAM).
- the memory 809 in the embodiment of the present application includes but is not limited to these and any other suitable types of memory.
- the processor 810 may include one or more processing units; optionally, the processor 810 integrates an application processor and a modem processor, wherein the application processor mainly processes operations related to an operating system, a user interface, and application programs, and the modem processor mainly processes wireless communication signals, such as a baseband processor. It is understandable that the modem processor may not be integrated into the processor 810.
- the radio frequency unit 801 may be used to receive a first message from a second device, the first message being used to indicate that the federated learning training is terminated or suspended; the processor 810 may be used to perform a first operation based on the first message; the terminal includes a client of the federated learning, and the second device includes a server of the federated learning. Or, The radio frequency unit 801 is used to send a first message to a first device, where the first message is used to indicate that federated learning training is terminated or suspended; wherein the first device includes a client of federated learning, and the terminal includes a server of federated learning.
- the server may send a first message to the client, and the first message is used to indicate that the federated learning training is terminated or suspended.
- the first device can be informed that the federated learning training is finished, and can perform a first operation based on the first message, for example, stopping the local federated learning training, deleting the local federated learning model, etc., to avoid occupying the client's space and computing power and improving the client's performance.
- the terminal 800 provided in the embodiment of the present application can also implement the various processes of the above-mentioned model training method embodiment and can achieve the same technical effect. To avoid repetition, it will not be repeated here.
- the embodiment of the present application also provides a network-side device, including a processor and a communication interface, wherein the communication interface is used to receive a first message from a second device, wherein the first message is used to indicate that the federated learning training is terminated or suspended; the processor is used to perform a first operation based on the first message; wherein the network-side device includes a client of federated learning, and the second device includes a server of federated learning.
- the communication interface is used to send a first message to the first device, wherein the first message is used to indicate that the federated learning training is terminated or suspended; wherein the first device includes a client of federated learning, and the network-side device includes a server of federated learning.
- This network side device embodiment corresponds to the above-mentioned network side device method embodiment.
- Each implementation process and implementation method of the above-mentioned method embodiment can be applied to this network side device embodiment and can achieve the same technical effect.
- the embodiment of the present application also provides a network side device.
- the network side device 900 includes: an antenna 91, a radio frequency device 92, a baseband device 93, a processor 94, and a memory 95.
- the antenna 91 is connected to the radio frequency device 92.
- the radio frequency device 92 receives information through the antenna 91 and sends the received information to the baseband device 93 for processing.
- the baseband device 93 processes the information to be sent and sends it to the radio frequency device 92.
- the radio frequency device 92 processes the received information and sends it out through the antenna 91.
- the method executed by the network-side device in the above embodiment may be implemented in the baseband device 93, which includes a baseband processor.
- the baseband device 93 may include, for example, at least one baseband board, on which a plurality of chips are arranged, as shown in FIG. 9 , wherein one of the chips is, for example, a baseband processor, which is connected to the memory 95 through a bus interface to call a program in the memory 95 and execute the network device operations shown in the above method embodiment.
- the network side device may also include a network interface 96, which is, for example, a common public radio interface (CPRI).
- a network interface 96 which is, for example, a common public radio interface (CPRI).
- CPRI common public radio interface
- the network side device 900 of the embodiment of the present invention also includes: instructions or programs stored in the memory 95 and executable on the processor 94.
- the processor 94 calls the instructions or programs in the memory 95 to execute the methods executed by the modules shown in Figure 5 or Figure 6, and achieves the same technical effect. To avoid repetition, it will not be repeated here.
- the embodiment of the present application further provides a network side device.
- the network side device 1000 includes: a processor 1001, a network interface 1002, and a memory 1003.
- the network interface 1002 is, for example, a common public radio interface (CPRI).
- CPRI common public radio interface
- the network side device 1000 of the embodiment of the present application further includes: a memory 1003 stored in the memory 1003 and can be used in the processing
- the processor 1001 calls the instructions or programs in the memory 1003 to execute the methods executed by the modules shown in FIG. 5 or FIG. 6 , and achieves the same technical effect. To avoid repetition, it will not be described here.
- An embodiment of the present application also provides a readable storage medium, on which a program or instruction is stored.
- a program or instruction is stored.
- the various processes of the above-mentioned model training method embodiment are implemented, and the same technical effect can be achieved. To avoid repetition, it will not be repeated here.
- the processor is the processor in the terminal described in the above embodiment.
- the readable storage medium may be non-volatile or non-transient.
- the readable storage medium includes a computer-readable storage medium, such as a computer read-only memory ROM, a random access memory RAM, a magnetic disk or an optical disk.
- An embodiment of the present application further provides a chip, which includes a processor and a communication interface, wherein the communication interface is coupled to the processor, and the processor is used to run programs or instructions to implement the various processes of the above-mentioned model training method embodiment, and can achieve the same technical effect. To avoid repetition, it will not be repeated here.
- the chip mentioned in the embodiments of the present application can also be called a system-level chip, a system chip, a chip system or a system-on-chip chip, etc.
- the embodiments of the present application further provide a computer program/program product, which is stored in a storage medium and is executed by at least one processor to implement the various processes of the above-mentioned model training method embodiment and can achieve the same technical effect. To avoid repetition, it will not be repeated here.
- An embodiment of the present application also provides a model training system, including: a terminal and a network side device, wherein the terminal can be used to execute the steps of the model training method as described above, and the network side device can be used to execute the steps of the model training method as described above.
- the technical solution of the present application can be embodied in the form of a computer software product, which is stored in a storage medium (such as ROM/RAM, a magnetic disk, or an optical disk), and includes a number of instructions for enabling a terminal (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the methods described in each embodiment of the present application.
- a storage medium such as ROM/RAM, a magnetic disk, or an optical disk
- a terminal which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Life Sciences & Earth Sciences (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Mobile Radio Communication Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (30)
- 一种模型训练方法,包括:第一设备接收来自第二设备的第一消息,所述第一消息用于指示联邦学习训练终止或中止;所述第一设备基于所述第一消息执行第一操作;其中,所述第一设备包括联邦学习的客户端,所述第二设备包括联邦学习的服务器。
- 根据权利要求1所述的方法,其中,所述第一消息包括如下至少之一:联邦学习训练终止的指示信息;联邦学习训练中止指示信息;联邦学习模型的模型标识或标识信息;联邦学习模型的模型信息;联邦学习模型的梯度信息;任务标识信息,所述任务标识信息用于指示联邦学习模型用于的任务类别;任务关联标识信息,所述任务关联标识信息用于指示目标联邦学习任务;原因信息,所述原因信息用于指示所述第二设备发送所述第一消息的原因;建议信息,所述建议信息用于指示所述第一设备在接收到所述第一消息后执行的操作。
- 根据权利要求2所述的方法,其中,所述原因信息用于指示以下至少一项:联邦学习过程已结束;联邦学习过程中断。
- 根据权利要求1-3任一项所述的方法,其中,所述第一操作包括如下至少之一:更新本地联邦学习模型;接收联邦学习模型;删除之前本地联邦学习训练时所使用的本地联邦学习模型;停止本地联邦学习训练。
- 根据权利要求4所述的方法,其中,所述第一操作包括所述更新本地联邦学习模型,和/或,接收联邦学习模型,所述第一设备接收来自第二设备的第一消息之后,所述方法还包括:所述第一设备保存所述联邦学习模型;其中,所述联邦学习模型支持被所述第一设备使用。
- 根据权利要求1所述的方法,其中,所述第一设备接收来自第二设备的第一消息之前,所述方法还包括:所述第一设备接收来自所述第二设备的联邦学习训练请求消息;所述第一设备向所述第二设备发送响应消息,所述响应消息包括获取联邦学习模型的请求信息。
- 根据权利要求1所述的方法,其中,所述第一设备接收来自第二设备的第一消息之前,所述方法还包括:所述第一设备在进行完本地联邦学习训练后发送第二消息至所述第二设备,所述第二消息包括联邦学习本地训练的训练结果,以及获取联邦学习模型的请求信息。
- 根据权利要求6或7所述的方法,其中,所述请求信息包括如下至少之一:第一请求信息,所述第一请求信息用于请求获取联邦学习模型;第二请求信息,所述第二请求信息用于请求获取联邦学习模型的模型信息;第三请求信息,所述第三请求信息用于请求获取联邦学习模型的梯度信息。
- 根据权利要求1至8任意一项所述的方法,所述联邦学习模型包括最终的全局模型或更新后的全局模型。
- 一种模型训练方法,包括:第二设备向第一设备发送第一消息,所述第一消息用于指示联邦学习训练终止或中止;其中,所述第一设备包括联邦学习的客户端,所述第二设备包括联邦学习的服务器。
- 根据权利要求10所述的方法,其中,所述第一消息包括如下至少之一:联邦学习训练终止的指示信息;联邦学习训练中止指示信息;联邦学习模型的模型标识或标识信息;联邦学习模型的模型信息;联邦学习模型的梯度信息;任务标识信息,所述任务标识信息用于指示联邦学习模型用于的任务类别;任务关联标识信息,所述任务关联标识信息用于指示目标联邦学习任务;原因信息,所述原因信息用于指示所述第二设备发送所述第一消息的原因;建议信息,所述建议信息用于指示所述第一设备在接收到所述第一消息后执行的操作。
- 根据权利要求11所述的方法,其中,所述原因信息用于指示以下至少一项:联邦学习过程结束;联邦学习过程中断。
- 根据权利要求11所述的方法,其中,所述建议信息用于指示所述第一设备在接收到所述第一消息后执行如下至少之一:更新本地联邦学习模型;接收联邦学习模型;删除之前本地联邦学习训练时所使用的本地联邦学习模型;停止本地联邦学习训练。
- 根据权利要求10所述的方法,其中,所述第二设备向第一设备发送第一消息之前,所述方法还包括:所述第二设备向所述第一设备发送联邦学习训练请求消息;所述第二设备接收来自所述第一设备的响应消息,所述响应消息包括获取联邦学习模型的请求信息。
- 根据权利要求10所述的方法,其中,所述第二设备向第一设备发送第一消息之前,所述方法还包括:所述第二设备接收来自所述第一设备的第二消息,所述第二消息包括所述第一设备联邦学习本地训练的训练结果,以及获取联邦学习模型的请求信息。
- 根据权利要求14或15所述的方法,其中,所述第二设备向第一设备发送第一消息,包括:所述第二设备根据所述获取联邦学习模型的请求信息,发送所述第一消息,其中,所述第一消息中包含以下至少一项:联邦学习模型的模型信息;联邦学习模型的梯度信息;其中,所述联邦学习模型包括最终的全局模型或更新后的全局模型。
- 一种模型训练装置,应用于第一设备,包括:接收模块,用于接收来自第二设备的第一消息,所述第一消息用于指示联邦学习训练终止或中止;处理模块,用于基于所述第一消息执行第一操作;其中,所述第一设备包括联邦学习的客户端,所述第二设备包括联邦学习的服务器。
- 根据权利要求17所述的装置,其中,所述第一消息包括如下至少之一:联邦学习训练终止的指示信息;联邦学习训练中止指示信息;联邦学习模型的模型标识或标识信息;联邦学习模型的模型信息;联邦学习模型的梯度信息;任务标识信息,所述任务标识信息用于指示联邦学习模型用于的任务类别;任务关联标识信息,所述任务关联标识信息用于指示目标联邦学习任务;建议信息,所述建议信息用于指示所述第一设备在接收到所述第一消息后执行的操作。
- 根据权利要求17或18所述的装置,其中,所述第一操作包括如下至少之一:更新本地联邦学习模型;接收联邦学习模型;删除之前本地联邦学习训练时所使用的本地联邦学习模型;停止本地联邦学习训练。
- 根据权利要求19所述的装置,其中,所述第一操作包括所述更新本地联邦学习模型,和/或,接收联邦学习模型,所述处理模块,还用于保存所述联邦学习模型;其中,所述联邦学习模型支持被所述第一设备使用。
- 根据权利要求17所述的装置,其中,所述接收模块,还用于接收来自所述第二设备的联邦学习训练请求消息;所述装置还包括发送模块,用于向所述第二设备发送响应消息,所述响应消息包括获取联邦学习模型的请求信息。
- 根据权利要求17所述的装置,其中,所述装置还包括发送模块,用于在进行完本地联邦学习训练后发送第二消息至所述第二设备,所述第二消息包括联邦学习本地训练的训练结果,以及获取联邦学习模型的请求信息。
- 根据权利要求17至22任意一项所述的装置,所述联邦学习模型包括最终的全局模型或更新后的全局模型。
- 一种模型训练装置,应用于第二设备,包括:发送模块,用于向第一设备发送第一消息,所述第一消息用于指示联邦学习训练终止或中止;其中,所述第一设备包括联邦学习的客户端,所述第二设备包括联邦学习的服务器。
- 根据权利要求24所述的装置,其中,所述第一消息包括如下至少之一:联邦学习训练终止的指示信息;联邦学习训练中止指示信息;联邦学习模型的模型标识或标识信息;联邦学习模型的模型信息;联邦学习模型的梯度信息;任务标识信息,所述任务标识信息用于指示联邦学习模型用于的任务类别;任务关联标识信息,所述任务关联标识信息用于指示目标联邦学习任务;建议信息,所述建议信息用于指示所述第一设备在接收到所述第一消息后执行的操作。
- 根据权利要求24所述的装置,其中,所述发送模块,还用于向所述第一设备发送联邦学习训练请求消息;所述装置还包括接收模块,用于接收来自所述第一设备的响应消息,所述响应消息包括获取联邦学习模型的请求信息。
- 根据权利要求24所述的装置,其中,所述装置还包括接收模块,用于接收来自所述第一设备的第二消息,所述第二消息包括所述第一设备联邦学习本地训练的训练结果,以及获取联邦学习模型的请求信息。
- 一种终端,包括处理器和存储器,所述存储器存储可在所述处理器上运行的程序或指令,所述程序或指令被所述处理器执行时实现如权利要求1至16任一项所述的方法的步骤。
- 一种网络侧设备,包括处理器和存储器,所述存储器存储可在所述处理器上运行的程序或指令,所述程序或指令被所述处理器执行时实现如权利要求1至16任一项所述的方法的步骤。
- 一种可读存储介质,所述可读存储介质上存储程序或指令,所述程序或指令被处理器执行时实现如权利要求1至16任一项所述的方法的步骤。
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP23900043.3A EP4633104A4 (en) | 2022-12-08 | 2023-12-07 | MODEL TRAINING METHOD, TERMINAL AND NETWORK-SIDE DEVICE |
| JP2025532177A JP2025538004A (ja) | 2022-12-08 | 2023-12-07 | モデルトレーニング方法、端末及びネットワーク側機器 |
| US19/229,729 US20250299110A1 (en) | 2022-12-08 | 2025-06-05 | Model training method, terminal, and network-side device |
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202211579377 | 2022-12-08 | ||
| CN202211579377.8 | 2022-12-08 | ||
| CN202310372773.1A CN118175052A (zh) | 2022-12-08 | 2023-04-07 | 模型训练方法、终端及网络侧设备 |
| CN202310372773.1 | 2023-04-07 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US19/229,729 Continuation US20250299110A1 (en) | 2022-12-08 | 2025-06-05 | Model training method, terminal, and network-side device |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024120470A1 true WO2024120470A1 (zh) | 2024-06-13 |
Family
ID=91347768
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2023/136968 Ceased WO2024120470A1 (zh) | 2022-12-08 | 2023-12-07 | 模型训练方法、终端及网络侧设备 |
Country Status (5)
| Country | Link |
|---|---|
| US (1) | US20250299110A1 (zh) |
| EP (1) | EP4633104A4 (zh) |
| JP (1) | JP2025538004A (zh) |
| CN (1) | CN118175052A (zh) |
| WO (1) | WO2024120470A1 (zh) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2026011431A1 (zh) * | 2024-07-12 | 2026-01-15 | 北京小米移动软件有限公司 | 信息处理方法、节点、通信设备、通信系统及存储介质 |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2026011435A1 (zh) * | 2024-07-12 | 2026-01-15 | 北京小米移动软件有限公司 | 模型训练方法、节点、通信设备、通信系统及存储介质 |
| CN118504717B (zh) * | 2024-07-19 | 2024-10-22 | 浙江霖研精密科技有限公司 | 基于梯度正交化的跨部门联邦学习方法、系统及存储介质 |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113869533A (zh) * | 2021-09-29 | 2021-12-31 | 深圳前海微众银行股份有限公司 | 联邦学习建模优化方法、设备、可读存储介质及程序产品 |
| WO2022099512A1 (zh) * | 2020-11-11 | 2022-05-19 | 北京小米移动软件有限公司 | 数据处理方法及装置、通信设备和存储介质 |
| CN115242756A (zh) * | 2021-04-01 | 2022-10-25 | 中国移动通信有限公司研究院 | 一种联邦学习业务的处理方法、装置、设备以及系统 |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10621019B1 (en) * | 2017-11-22 | 2020-04-14 | Amazon Technologies, Inc. | Using a client to manage remote machine learning jobs |
-
2023
- 2023-04-07 CN CN202310372773.1A patent/CN118175052A/zh active Pending
- 2023-12-07 WO PCT/CN2023/136968 patent/WO2024120470A1/zh not_active Ceased
- 2023-12-07 JP JP2025532177A patent/JP2025538004A/ja active Pending
- 2023-12-07 EP EP23900043.3A patent/EP4633104A4/en active Pending
-
2025
- 2025-06-05 US US19/229,729 patent/US20250299110A1/en active Pending
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2022099512A1 (zh) * | 2020-11-11 | 2022-05-19 | 北京小米移动软件有限公司 | 数据处理方法及装置、通信设备和存储介质 |
| CN115242756A (zh) * | 2021-04-01 | 2022-10-25 | 中国移动通信有限公司研究院 | 一种联邦学习业务的处理方法、装置、设备以及系统 |
| CN113869533A (zh) * | 2021-09-29 | 2021-12-31 | 深圳前海微众银行股份有限公司 | 联邦学习建模优化方法、设备、可读存储介质及程序产品 |
Non-Patent Citations (3)
| Title |
|---|
| AIHUA LI, CHINA MOBILE: "Horizontal Federated Learning among Multiple NWDAFs in TS 23.288", 3GPP DRAFT; S2-2211435; TYPE CR; CR 0582; FS_ENA_PH3, 3RD GENERATION PARTNERSHIP PROJECT (3GPP), MOBILE COMPETENCE CENTRE ; 650, ROUTE DES LUCIOLES ; F-06921 SOPHIA-ANTIPOLIS CEDEX ; FRANCE, vol. 3GPP SA 2, no. Toulouse, FR; 20221114 - 20221118, 22 November 2022 (2022-11-22), Mobile Competence Centre ; 650, route des Lucioles ; F-06921 Sophia-Antipolis Cedex ; France, XP052225434 * |
| See also references of EP4633104A4 * |
| VIVIAN CHONG, VIVO, NTT DOCOMO, ERICSSON, LG ELECTRONICS, NOKIA, NOKIA SHANGHAI-BELL, HUAWEI, HISILICON: "Updates for Nnwdaf_MLModelTraining service", 3GPP DRAFT; S2-2307882; TYPE CR; CR 0773; ENA_PH3, 3RD GENERATION PARTNERSHIP PROJECT (3GPP), MOBILE COMPETENCE CENTRE ; 650, ROUTE DES LUCIOLES ; F-06921 SOPHIA-ANTIPOLIS CEDEX ; FRANCE, vol. 3GPP SA 2, no. Berlin, DE; 20230522 - 20230526, 30 May 2023 (2023-05-30), Mobile Competence Centre ; 650, route des Lucioles ; F-06921 Sophia-Antipolis Cedex ; France, XP052382713 * |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2026011431A1 (zh) * | 2024-07-12 | 2026-01-15 | 北京小米移动软件有限公司 | 信息处理方法、节点、通信设备、通信系统及存储介质 |
Also Published As
| Publication number | Publication date |
|---|---|
| JP2025538004A (ja) | 2025-11-20 |
| EP4633104A1 (en) | 2025-10-15 |
| CN118175052A (zh) | 2024-06-11 |
| EP4633104A4 (en) | 2026-01-28 |
| US20250299110A1 (en) | 2025-09-25 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2024120470A1 (zh) | 模型训练方法、终端及网络侧设备 | |
| US20240188047A1 (en) | Computing session update method and apparatus, and communication device | |
| WO2023246584A1 (zh) | 算力处理方法、装置及通信设备 | |
| WO2023246756A1 (zh) | 算力服务方法、装置、终端及核心网设备 | |
| WO2024037632A1 (zh) | 通信方法、终端及网络侧设备 | |
| WO2023125932A1 (zh) | Ai网络信息传输方法、装置及通信设备 | |
| WO2023131286A1 (zh) | 资源控制方法、装置、终端、网络侧设备及可读存储介质 | |
| WO2024125358A1 (zh) | 算力处理方法及通信设备 | |
| WO2024017023A1 (zh) | 允许的nssai的获取方法、终端及网络侧设备 | |
| WO2024078400A1 (zh) | 模型请求方法、装置、通信设备及可读存储介质 | |
| WO2023185929A1 (zh) | 资源控制方法、装置、终端及网络侧设备 | |
| CN116847330A (zh) | 终端不可用时期的协商方法、终端及网络侧设备 | |
| CN116939638A (zh) | 通信方法、装置及相关设备 | |
| WO2024149288A1 (zh) | Ai模型分发、接收方法、终端及网络侧设备 | |
| WO2024022398A1 (zh) | 托管网络的选网信息的获取方法、终端及网络侧设备 | |
| WO2025195274A1 (zh) | 任务处理方法、终端及网络侧设备 | |
| WO2023179553A1 (zh) | 终端不可用时期的协商方法、终端及网络侧设备 | |
| WO2025185508A1 (zh) | 网络接入的控制方法、装置及通信设备 | |
| WO2024140712A1 (zh) | 模型提供、模型获取、设备查询方法、装置和通信设备 | |
| WO2025130732A1 (zh) | 终端节能的方法、终端设备和网络设备 | |
| WO2025195331A1 (zh) | 通信方法、装置、设备及存储介质 | |
| WO2025026193A1 (zh) | 业务处理方法、装置、通信设备及可读存储介质 | |
| WO2025195276A1 (zh) | 任务处理方法、终端及网络侧设备 | |
| WO2025195234A1 (zh) | 终端能力注册方法、获取方法、装置、终端及设备 | |
| WO2025209284A1 (zh) | 通信数据处理方法、装置及通信设备 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23900043 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2025532177 Country of ref document: JP Kind code of ref document: A |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2025532177 Country of ref document: JP |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2023900043 Country of ref document: EP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2023900043 Country of ref document: EP Effective date: 20250708 |
|
| REG | Reference to national code |
Ref country code: BR Ref legal event code: B01A Ref document number: 112025011463 Country of ref document: BR |
|
| WWP | Wipo information: published in national office |
Ref document number: 2023900043 Country of ref document: EP |
|
| ENP | Entry into the national phase |
Ref document number: 112025011463 Country of ref document: BR Kind code of ref document: A2 Effective date: 20250605 |