WO2024041119A1 - 数据备份方法和装置 - Google Patents
数据备份方法和装置 Download PDFInfo
- Publication number
- WO2024041119A1 WO2024041119A1 PCT/CN2023/100112 CN2023100112W WO2024041119A1 WO 2024041119 A1 WO2024041119 A1 WO 2024041119A1 CN 2023100112 W CN2023100112 W CN 2023100112W WO 2024041119 A1 WO2024041119 A1 WO 2024041119A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- backup
- task
- business object
- backup task
- business objects
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operations
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1458—Management of the backup or restore process
- G06F11/1461—Backup scheduling policy
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operations
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2201/00—Indexing scheme relating to error detection, to error correction, and to monitoring
- G06F2201/80—Database-specific techniques
Definitions
- the present application relates to data backup technology, and in particular, to a data backup method and device.
- the client regularly generates copies of the data of different business objects and transmits these copies to the background backup system within a predetermined time window, which is a backup task.
- a backup task For many backup tasks to be transmitted, the goal of backup orchestration is to design a set of efficient backup transmission resource (transmission time, bandwidth, etc.) allocation and scheduling scheme, and allocate limited backup resources to many backup tasks to be transmitted, so that as much as possible Multiple backup tasks can be completed within a predetermined time window, maximizing the backup bandwidth/throughput of the entire system.
- a backup orchestration solution based on dynamic scheduling is adopted, which is achieved by dynamically setting task priority and/or task start time.
- the task priority is set by the customer, and the task start time is dynamically calculated based on the real-time status of the system.
- the scheduling effect of this solution is greatly affected by the customer's manual experience and random operations, and may not match the dynamic characteristics of the backup system and each backup task accurately enough.
- preemptive interrupt operations are needed to adapt to dynamic fluctuations after the task is started.
- preemptive interrupt operations will cause additional interruption and recovery overhead, affecting the system throughput and the service level agreement (Service) of the task.
- Level Agreement, SLA Service level Agreement
- This application provides a data backup method and device to fully consider the matching accuracy of the dynamic characteristics of the backup system and each backup task, avoid manual scheduling intervention on multiple backup tasks, and improve the SLA time window compliance of tasks.
- this application provides a data backup method, which includes: obtaining a first backup data amount and a first task completion time of multiple business objects, and the first backup data amount and the first task completion time are consistent with the first backup data amount and the first task completion time.
- obtaining the scheduling order of the next backup task of the plurality of business objects according to the first backup data amount and the first task completion time; according to the plurality of business objects The scheduling sequence of the object's next backup task sends a scheduling instruction to the client, where the scheduling instruction includes the identification of the business object corresponding to the backup task to be scheduled.
- the scheduling sequence of the next backup task of multiple business objects is obtained based on the backup data volume and task completion time prediction of the last backup task of multiple business objects, which can fully consider the backup system and each
- the matching accuracy of the dynamic characteristics of backup tasks can also avoid manual intervention in scheduling multiple backup tasks and improve the SLA time window compliance of tasks.
- the multiple business objects of the client can include business objects of servers that support various operating systems (for example, Windows, Unix, Linux, VMware, etc.), and can also include various types of databases (for example, Oracle, SQL, DB2, etc.)
- the business objects may also include business objects of other devices or services with data backup requirements, which are not specifically limited in the embodiments of this application.
- a client can include multiple business objects. For example, multiple VMs are created on the server, and one VM is a business object.
- a database service is created for multiple users, and one database service is That is a business object.
- the first backup data amount and the first task completion time correspond to the last backup task of the multiple business objects.
- the data backup requirement of a business object is a long-term and recurring process. For example, a business object will regularly transfer production data from the client at a certain backup frequency (for example, hourly, daily, weekly, monthly, etc.) The data is backed up to the background backup system through the switching network, so the business object can regard each data transmission process as a backup task. It can be seen that the backup task of the business object is started and executed periodically. Based on this, in the embodiment of this application, the backup task to be executed in the next cycle is called the next backup task of the business object, and the previous backup task of the backup task is called the next backup task of the business object. The backup task that has been executed in a cycle is called the last backup task of the business object.
- the client can report the backup data volume and task completion time (JCT) of the backup task to Back up your system.
- JCT backup data volume and task completion time
- the backup system can use the following two methods to obtain the scheduling order of the next backup tasks of multiple business objects:
- the second backup data volume and the second task completion time are The time corresponds to the next backup task of multiple business objects; the scheduling order of the next backup task of multiple business objects is obtained based on the second backup data volume and the second task completion time.
- the backup system can calculate the data corresponding to the first business object.
- the ratio of the second backup data amount to the second task completion time is used to obtain the ease of the next backup task of the first business object. That is, the backup system needs to calculate the ease of the next backup task for each of multiple business objects.
- the ease is the ratio of the second backup data amount and the second task completion time. The larger the ratio, the easier the next backup task is. , the smaller the ratio, the more difficult the next backup task is.
- the remaining startup time of the next backup task of the first business object is obtained according to the completion time of the second task corresponding to the first business object.
- the backup system can calculate the difference between the latest task end time MaxEndTime (this information is reported to the backup system when the business object is registered or reconfigured), the second task completion time and the current time t to obtain the remaining startup time of the next backup task.
- the backup system obtains the ease of its next backup task and the After the remaining startup time, the ratio of the ease of the next backup task of the first business object to the remaining startup time of the next backup task of multiple business objects can be calculated to obtain the scheduling threshold of the next backup task of the first business object. .
- the backup system can obtain the scheduling threshold of the next backup task for all business objects. The larger the scheduling threshold, it means that the next backup task is easy to complete and time is tight, and needs to be started as soon as possible. The smaller the scheduling threshold, it means that the next backup task will be completed as soon as possible. If the task is difficult to complete and the time is flexible, the start can be postponed. Then, the threshold values of the next backup tasks of multiple business objects are sorted from large to small to obtain the scheduling order of the next backup tasks of multiple business objects.
- the above two methods use two machine learning models.
- the difference is that the output of the machine learning model is different, which is related to the training process of the machine learning model.
- the machine learning model can also output other information to assist in obtaining the information of multiple business objects.
- the scheduling order of the next backup task Therefore, the embodiments of this application do not limit the output of the machine learning model.
- the system throughput and task SLA time window compliance are fully considered during training, combined with the dynamic characteristics of the backup task (for example, the backup data volume of the backup task, task completion time, and the earliest start time of the task). , the latest task end time, etc.), the matching accuracy of the dynamic characteristics of the backup system and each backup task is trained, so that the prediction results of the machine learning model are closer to the actual execution results.
- methods other than the above-mentioned methods may also be used to obtain the scheduling order of the next backup tasks of multiple business objects, and there is no specific limitation on this.
- the backup system can first determine whether the next backup task of the first business object is cancelled; when the next backup task of the first business object is not cancelled, the backup system can determine whether the next backup task of the first business object is cancelled. Obtain the scheduling order of the next backup task for multiple business objects based on the ease of use and the remaining start time of the next backup task for multiple business objects.
- the backup system can determine whether the remaining startup time of the next backup task of the first business object is less than 0. When the remaining start time of the next backup task of the first business object is less than 0 (indicating that the latest end time of the next backup task of the first business object has passed), the cancellation of the next backup task of the first business object is calculated.
- Probability for example, calculate 1-p, p is a preset probability value, the larger 1-p is, the greater the probability of cancellation of the next backup task of the first business object, and the more likely it is to be canceled, the smaller 1-p means that the smaller the cancellation probability of the next backup task of the first business object, the more likely it is that it will not be canceled); when the cancellation probability of the next backup task of the first business object is greater than the preset threshold, it is determined to cancel the first business object.
- the next backup task when the cancellation probability of the next backup task of the first business object is less than or equal to the preset threshold, it is determined not to cancel the next backup task of the first business object.
- next backup task of the business object that needs to be canceled can be scheduled to avoid unnecessary scheduling caused by scheduling the next backup task of the business object that needs to be canceled, thus improving the next backup task of multiple business objects. scheduling efficiency.
- the backup system After the backup system obtains the scheduling order of the next backup task for multiple business objects, it can take the backup task at the head of the queue from the scheduling queue according to the number of parallel threads available in real time and the scheduling order as the backup task to be scheduled, and send it to the backup task.
- the client sends a scheduling instruction, which includes the identification of the backup task to be scheduled. For example, if the number of parallel threads available in real time is 2, then there can be 2 backup tasks that can currently be scheduled. Therefore, the backup system takes 2 backup tasks from the head of the scheduling queue (for example, Job IDs are 1 and 2) and sends them to the customer.
- the client reports multiple data to the backup system again.
- the amount of backup data and task completion time for this backup task of the business object can choose to send the backup data amount and task completion time of each business object backup task.
- the client can also choose to send the backup data of multiple business objects at once after completing the backup task of multiple business objects. There are no specific restrictions on the backup data volume and task completion time of the backup task.
- the method further includes: receiving the backup of the second business object sent by the client.
- the rate limit rate of the backup tasks of multiple business objects in the next cycle is predicted based on the sending rate and receiving rate of the backup tasks of multiple business objects in the previous cycle, which can make the port Bandwidth resources can be optimally used based on the real-time transmission characteristics of tasks, improving system throughput. It can also avoid manual intervention in scheduling multiple backup tasks and improve system transmission efficiency.
- the backup task of the second business object is being executed, that is, in the embodiment of the present application, the speed limit can be implemented for the backup task that is being executed (the client is sending the backup data of the backup task to the backup system), so that the port bandwidth Resources can be optimally used based on the real-time transmission characteristics of tasks, improving system throughput.
- the backup task of business objects is executed periodically.
- the backup task being executed can be considered to be in the current cycle (can also be called the next cycle), then the cycle in which the backup task was last executed can be considered Is the previous cycle of this backup task.
- a rate reporting cycle can be set, and the client reports the second service in the previous reporting cycle to the backup system according to this reporting cycle.
- the sending rate and receiving rate are regarded as the average sending rate and average receiving rate in the previous reporting period, or they can also be regarded as the weighted value in the previous reporting period.
- the average sending rate and the weighted average receiving rate are not specifically limited.
- the backup system can input the sending rate and the receiving rate into the third machine learning model to obtain the receiving rate of the backup task of the second business object in the next cycle; and then obtain the receiving rate of the backup task of the second business object in the next cycle. 2. The rate limit of the backup task of the business object in the next cycle.
- the backup system can obtain the preset bandwidth of the first port, which is used to perform the backup task of the second business object; obtain the sum of the reception rates of all backup tasks transmitted by the first port in the next cycle; according to the first port
- the rate limit rate of the backup task of the second business object in the next cycle is obtained by the sum of the preset bandwidth, the reception rate of the backup task of the second business object in the next cycle, and the reception rate.
- the predicted reception rate of backup task i is the reception rate of the backup task of the second business object in the next cycle
- the sum of the predicted reception rates of all backup tasks transmitted on the first port is the sum of the predicted reception rates of all backup tasks transmitted on the first port. The sum of the receive rates in the next cycle.
- the embodiment of the present application can also use other methods to calculate the rate limit rate of the backup task of the second business object in the next cycle, which is not specifically limited.
- the backup system sends the rate limit rate of the backup task of the second business object in the next cycle to the client through the rate limit indication, so that the client can transmit the backup data of the backup task of the second business object based on the rate limit rate. Sending rate is limited.
- the client can limit the sending rate of the backup data of the backup task that transmits the second business object based on the rate limit, so that the port bandwidth resources can be optimally used based on the real-time transmission characteristics of the task and improve the throughput of the system.
- the method further includes: training to obtain a target machine learning model, where the target machine learning model includes at least one of a first machine learning model, a second machine learning model, and a third machine learning model, so The first machine learning model is used to predict the scheduling sequence of the next backup task of the business object, the second machine learning model is used to predict the second backup data volume and the second task completion time of the business object, and the third machine The learning model is used to predict the reception rate of backup tasks of business objects in the next cycle.
- Training data is data used to train machine learning models.
- the training data can be different depending on the structure, parameters, functions, etc. of the machine learning model.
- the target machine learning model includes at least one of a first machine learning model, a second machine learning model, and a third machine learning model, where the first machine learning model is used to predict the next backup of the business object.
- the scheduling sequence of tasks, the second machine learning model is used to predict the second backup data volume and the second task completion time of the business object, and the third machine learning model is used to predict the reception rate of the backup task of the business object in the next cycle.
- the backup system can obtain the historical backup data volume and historical task completion time of multiple business objects.
- the historical backup data volume and historical task completion time correspond to the completed backup tasks of multiple business objects; obtain the preset machine learning Model; input the historical backup data volume and historical task completion time of multiple business objects into the preset machine learning model to obtain the predicted backup data volume and predicted task completion time of multiple business objects; based on the predicted backup data volume and predicted task completion Time for convergence training to obtain the target machine learning model.
- the backup system can obtain the historical receiving rate and historical sending rate of completed backup tasks of multiple business objects; obtain the preset machine learning model; input the historical receiving rate and historical sending rate into the preset machine learning model to Obtain the predicted reception rate of backup tasks for multiple business objects; perform convergence training based on the predicted reception rate to obtain the target machine learning model.
- the preset machine learning model may also be different, including the structure and parameters of the preset machine learning model. This is not the case in the embodiments of the present application. Make specific limitations.
- the machine learning model can also use the error back propagation (BP) algorithm to correct the size of the parameters in the initial model during the training process, so that the reconstruction error loss of the model becomes smaller and smaller. Specifically, forward propagation of the input signal until the output will produce an error loss, and backward propagation of the error loss information is used to update the parameters in the initial model, so that the error loss converges.
- the backpropagation algorithm is a backpropagation movement dominated by error loss, aiming to obtain the optimal parameters of the machine learning model, such as the weight matrix.
- this application provides a data backup device, including: an acquisition module, configured to acquire a first backup data amount and a first task completion time of multiple business objects, the first backup data amount and the first task completion time.
- the task completion time corresponds to the last backup task of the multiple business objects;
- the scheduling module is used to obtain the next backup task of the multiple business objects according to the first backup data amount and the first task completion time.
- the sending module is configured to send a scheduling instruction to the client according to the scheduling sequence of the next backup task of the multiple business objects, where the scheduling instruction includes the identification of the business object corresponding to the backup task to be scheduled.
- the scheduling module is specifically configured to input the first backup data amount and the first task completion time into a first machine learning model to obtain the next time of the multiple business objects.
- the scheduling order of backup tasks is specifically configured to input the first backup data amount and the first task completion time into a first machine learning model to obtain the next time of the multiple business objects.
- the scheduling module is specifically configured to input the first backup data amount and the first task completion time into a second machine learning model to obtain a second value of the multiple business objects.
- the backup data amount and the second task completion time, the second backup data amount and the second task completion time correspond to the next backup task of the multiple business objects; according to the second backup data amount and the second task completion time
- the second task completion time obtains the scheduling order of the next backup tasks of the multiple business objects.
- the scheduling module is specifically configured to calculate the ratio of the second backup data amount corresponding to the first business object and the second task completion time to obtain the first business object's ratio.
- the ease of the next backup task the first business object is any one of the plurality of business objects; the first business object is obtained according to the second task completion time corresponding to the first business object.
- the remaining startup time of the next backup task when the ease of the next backup task of the multiple business objects and the remaining startup time of the next backup task of the multiple business objects are both obtained, according to the multiple business objects
- the ease of the next backup task of the object and the remaining start time of the next backup task of the multiple business objects are used to obtain the scheduling order of the next backup tasks of the multiple business objects.
- the scheduling module is also used to determine whether the next backup task of the first business object is canceled; when the next backup task of the first business object is not canceled, based on the The scheduling order of the next backup task of the multiple business objects is obtained based on the ease of the next backup task of the multiple business objects and the remaining startup time of the next backup task of the multiple business objects.
- the scheduling module is specifically configured to calculate the ratio of the ease of the next backup task of the multiple business objects and the remaining startup time of the next backup task of the multiple business objects.
- the scheduling module is specifically configured to determine whether the remaining startup time of the next backup task of the first business object is less than 0; when the next backup task of the first business object When the remaining startup time is less than 0, calculate the cancellation probability of the next backup task of the first business object; when the cancellation probability of the next backup task of the first business object is greater than the preset threshold, determine to cancel the first backup task.
- the next backup of the business object tasks are specifically configured to determine whether the remaining startup time of the next backup task of the first business object is less than 0; when the next backup task of the first business object When the remaining startup time is less than 0, calculate the cancellation probability of the next backup task of the first business object; when the cancellation probability of the next backup task of the first business object is greater than the preset threshold, determine to cancel the first backup task. The next backup of the business object tasks.
- the scheduling module is further configured to determine not to cancel the next backup task of the first business object when the remaining startup time of the next backup task of the first business object is greater than or equal to 0. A backup task.
- the scheduling module is further configured to determine not to cancel the first business object when the cancellation probability of the next backup task of the first business object is less than or equal to a preset threshold. Next backup task.
- the method further includes: a rate limiting module configured to receive the sending rate and receiving rate of the backup task of the second business object sent by the client in the previous cycle, and the second business object The backup task is being executed; obtain the rate limit rate of the backup task of the second business object in the next cycle according to the sending rate and the receiving rate; the sending module is also used to send to the client A rate limit indication, which includes a rate limit rate of the backup task of the second business object in the next cycle.
- a rate limiting module configured to receive the sending rate and receiving rate of the backup task of the second business object sent by the client in the previous cycle, and the second business object The backup task is being executed; obtain the rate limit rate of the backup task of the second business object in the next cycle according to the sending rate and the receiving rate; the sending module is also used to send to the client A rate limit indication, which includes a rate limit rate of the backup task of the second business object in the next cycle.
- the rate limiting module is specifically configured to input the sending rate and the receiving rate into a third machine learning model to obtain the backup task of the second business object in the next cycle.
- Reception rate obtain the rate limit rate of the backup task of the second business object in the next cycle according to the reception rate of the backup task of the second business object in the next cycle.
- the rate limiting module is specifically used to obtain the preset bandwidth of a first port, and the first port is used to perform the backup task of the second business object; obtain the first The sum of the reception rates of all backup tasks transmitted by the port in the next cycle; the preset bandwidth of the first port, the reception rate of the backup tasks of the second business object in the next cycle, and the sum of the reception rates Obtain the rate limit rate of the backup task of the second business object in the next cycle.
- the method further includes: a training module for training to obtain a target machine learning model, where the target machine learning model includes a first machine learning model, a second machine learning model, and a third machine learning model. At least one of, the first machine learning model is used to predict the scheduling order of the next backup task of the business object, and the second machine learning model is used to predict the second backup data amount and the second task completion time of the business object, The third machine learning model is used to predict the reception rate of the backup task of the business object in the next cycle.
- the training module is specifically configured to obtain the historical backup data volume and historical task completion time of the multiple business objects, the historical backup data volume and the historical task completion time, and the historical task completion time.
- the training module is specifically configured to obtain the historical backup data volume and historical task completion time of the multiple business objects, the historical backup data volume and the historical task completion time, and the historical task completion time.
- the training module is specifically configured to obtain the historical backup data volume and historical task completion time of the multiple business objects, the historical backup data volume and the historical task completion time, and the historical task completion time.
- the training module is specifically configured to obtain the historical backup data volume and historical task completion time of the multiple business objects, the historical backup data volume and the historical task completion time, and the historical task completion time.
- the training module is specifically configured to obtain the historical receiving rate and historical sending rate of completed backup tasks of the multiple business objects; obtain a preset machine learning model; and convert the historical The reception rate and historical sending rate are input into the preset machine learning model to obtain the predicted reception rate of the backup tasks of the multiple business objects; convergence training is performed based on the predicted reception rate to obtain the target machine learning model.
- this application provides a backup system, including: one or more processors; a memory for storing one or more programs; when the one or more programs are executed by the one or more processors , causing the one or more processors to implement the method described in any one of the above first aspects.
- the present application provides a computer-readable storage medium, including a computer program.
- the computer program When the computer program is executed on a computer, it causes the computer to perform the method described in any one of the above-mentioned first aspects.
- the present application provides a computer program product.
- the computer program product includes computer program code.
- the computer program code When the computer program code is run on a computer, it causes the computer to execute the method described in any one of the above first aspects. .
- Figure 1 is an exemplary framework diagram of a backup system
- Figure 2 is an exemplary structural diagram of the system architecture of the present application
- Figure 3 is a flow chart of the process 300 of the data backup method according to the embodiment of the present application.
- Figure 4 is a flow chart of the process 400 of the data backup method according to the embodiment of the present application.
- Figure 5 is a flow chart of the process 500 of the data backup method according to the embodiment of the present application.
- Figure 6 is an exemplary schematic diagram of the client configuration and registration process
- Figure 7 is an exemplary schematic diagram of the scheduling and ranking calculation process
- Figure 8 is an exemplary schematic diagram of a real-time scheduling interaction process
- Figure 9 is an exemplary schematic diagram of the speed limit rate calculation process
- Figure 10 is an exemplary schematic diagram of the training process of the machine learning model
- Figure 11 is an exemplary structural diagram of a machine learning model
- Figure 12 is an exemplary structural schematic diagram of the data backup device 1200 according to the embodiment of the present application.
- At least one (item) refers to one or more, and “plurality” refers to two or more.
- “And/or” is used to describe the relationship between associated objects, indicating that there can be three relationships. For example, “A and/or B” can mean: only A exists, only B exists, and A and B exist simultaneously. , where A and B can be singular or plural. The character “/” generally indicates that the related objects are in an "or” relationship. “At least one of the following” or similar expressions thereof refers to any combination of these items, including any combination of a single item (items) or a plurality of items (items).
- At least one of a, b or c can mean: a, b, c, "a and b", “a and c", “b and c", or "a and b and c” ”, where a, b, c can be single or multiple.
- the client regularly generates copies of the data of different business objects and transmits these copies to the background backup system within a predetermined time window, which is a backup task.
- the goal of backup orchestration is to design a set of efficient backup transmission resource (transmission time, bandwidth, etc.) allocation and scheduling scheme, and allocate limited backup resources to many backup tasks to be transmitted, so that as much as possible Multiple backup tasks can be completed within the scheduled time window and maximize Backup bandwidth/throughput of the entire system.
- the backup data of actual production environments in various industries is complex and changeable.
- the arrival times and read and write rates of different backup business object streams generated by multiple clients/business objects are highly dynamic, and the backup time windows allowed by tasks are also different.
- effective backup orchestration cannot be performed for many backup tasks, the business object flows of different tasks will cause resource conflicts or unnecessary resource idleness in the same time and space, resulting in low effective bandwidth and damage to the Service Level Agreement (Service Level Agreement, SLA). .
- SLA Service Level Agreement
- Figure 1 is an exemplary framework diagram of a backup system.
- the client for example, client 1, client N
- VM virtual machine
- client N deploys a series of VMs (for example, VM1 ⁇ VM N) to host different Application business objects.
- VM business objects will regularly back up production data from the client to the background backup system through the switching network at a certain backup frequency (for example, hourly, daily, weekly, monthly, etc.).
- Each VM business object is divided into full backup (backing up all data) and incremental backup (backing up only new data compared to the last backup).
- the amount of backup data to be transferred in each backup task is highly dynamic and unknown in advance.
- Each backup task must be backed up within a specified time window (for example, only after the job stops and before the next job starts). Due to differences in the backup data storage media and production environment of each backup task, the client's ability to read backup data, dynamic transmission over the network, and the backup system's disk download processing speed are all highly dynamic.
- the backup system needs to perform efficient scheduling and orchestration of numerous backup tasks under various time windows and resource constraints, so that as many backup tasks as possible can be completed within the specified time window and maximize system backup throughput.
- the backup task orchestration challenge of the backup system mainly lies in how to make the best intelligent supply and demand orchestration decisions under the influence of many dynamic and uncertain factors.
- Dynamic uncertain factors include: 1) The total data volume of the backup task is unknown in advance; 2) The rate at which the client reads backup data from the local storage medium fluctuates dynamically (affected by the local read rate, Central Processing Unit, CPU) load, etc.); 3) The real-time transmission rate of backup data is dynamically affected by link bandwidth congestion, back-end reception and input and output (Input Output, IO) disk placement; 4) Backup tasks are constantly and dynamically completed, exited and new The entry of backup tasks causes the load of the backup system and the real-time available remaining resources of the entire backup system to fluctuate dynamically.
- Static orchestration mainly controls the execution of backup tasks by setting fixed task backup policies.
- Dynamic orchestration is the dynamic adaptive adjustment of scheduling based on the dynamic attributes of business objects.
- the scheduling effect of these orchestration strategies is largely affected by the customer's manual experience and random operations. It may not match the dynamic characteristics of the backup system and each backup task accurately enough, and may even cause additional interruption and recovery overhead, affecting the system's performance.
- this application provides a data backup method that can maximize the throughput of the entire backup system and schedule as many backup tasks as possible to complete the transmission of backup data within their respective SLA time windows. lose.
- Neural network is a machine learning model.
- the neural network can be composed of neural units.
- the neural unit can refer to an arithmetic unit that takes xs and intercept 1 as input.
- the output of the arithmetic unit can be:
- s 1, 2,...n, n is a natural number greater than 1
- Ws is the weight of xs
- b is the bias of the neural unit.
- f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of this activation function can be used as the input of the next convolutional layer.
- the activation function can be a nonlinear function such as ReLU.
- a neural network is a network formed by connecting many of the above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit.
- the input of each neural unit can be connected to the local receptive field of the previous layer to extract the characteristics of the local receptive field.
- the local receptive field can be an area composed of several neural units.
- Multi-layer perception (MLP)
- MLP is a simple deep neural network (DNN) (different layers are fully connected), also called a multi-layer neural network, which can be understood as a neural network with many hidden layers.
- DNN deep neural network
- the neural network inside DNN can be divided into three categories: input layer, hidden layer, and output layer.
- the first layer is the input layer
- the last layer is the output layer
- the layers in between are hidden layers.
- the layers are fully connected, that is to say, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.
- DNN looks very complicated, the work of each layer is actually not complicated.
- the coefficient from the k-th neuron in layer L-1 to the j-th neuron in layer L is defined as It should be noted that the input layer has no W parameter.
- more hidden layers make the network more capable of describing complex situations in the real world. Theoretically, a model with more parameters has higher complexity and greater "capacity", which means it can complete more complex learning tasks.
- Training a deep neural network is the process of learning the weight matrix. The ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (a weight matrix formed by the vectors W of many layers).
- CNN Convolutional neural network
- the convolutional neural network contains a feature extractor composed of convolutional layers and pooling layers.
- the feature extractor can be regarded as a filter, and the convolution process can be regarded as using a trainable filter to convolve with an input image or convolution feature plane (feature map).
- the convolutional layer refers to the neuron layer in the convolutional neural network that convolves the input signal.
- the convolution layer can include many convolution operators.
- the convolution operator is also called a kernel. Its role in image processing is equivalent to a filter that extracts specific information from the input image matrix.
- the convolution operator can essentially Is a weight matrix, which is usually predefined. During the convolution operation on the image, the weight matrix is usually pixel by pixel (or two pixels by two pixels) along the horizontal direction on the input image... ...This depends on the value of the step size) to complete the task of extracting specific features from the image.
- the size of the weight matrix should be related to the size of the image.
- the depth dimension of the weight matrix is the same as the depth dimension of the input image.
- the weight matrix will extend to Enter the entire depth of the image. Therefore, convolution with a single weight matrix will produce a convolved output with a single depth dimension, but in most cases, instead of using a single weight matrix, multiple weight matrices of the same size (rows ⁇ columns) are applied, That is, multiple matrices of the same type.
- the output of each weight matrix is stacked to form the depth dimension of the convolution image.
- the dimension here can be understood as being determined by the "multiple" mentioned above. Different weight matrices can be used to extract different features in the image.
- one weight matrix is used to extract edge information of the image
- another weight matrix is used to extract specific colors of the image
- another weight matrix is used to remove unnecessary noise in the image. Perform blurring, etc.
- the multiple weight matrices have the same size (row ⁇ column), and the feature maps extracted by the multiple weight matrices with the same size are also the same size.
- the extracted multiple feature maps with the same size are then merged to form a convolution operation. output.
- the weight values in these weight matrices require a lot of training in practical applications.
- Each weight matrix formed by the weight values obtained through training can be used to extract information from the input image, so that the convolutional neural network can make correct predictions.
- the initial convolutional layer often extracts more general features, which can also be called low-level features; as the depth of the convolutional neural network deepens,
- the features extracted by subsequent convolutional layers become more and more complex, such as high-level semantic features.
- Features with higher semantics are more suitable for the problem to be solved.
- the pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling the input image to obtain a smaller size image.
- the average pooling operator can calculate the pixel values in the image within a specific range to generate an average value as the result of average pooling.
- the max pooling operator can take the pixel with the largest value in a specific range as the result of max pooling.
- the operators in the pooling layer should also be related to the size of the image.
- the size of the image output after processing by the pooling layer can be smaller than the size of the image input to the pooling layer.
- Each pixel in the image output by the pooling layer represents the average or maximum value of the corresponding sub-region of the image input to the pooling layer.
- the convolutional neural network After being processed by the convolutional layer/pooling layer, the convolutional neural network is not enough to output the required output information. Because As mentioned before, the convolutional layer/pooling layer only extracts features and reduces the parameters brought by the input image. However, in order to generate the final output information (required class information or other related information), the convolutional neural network needs to use neural network layers to generate an output or a set of required number of classes. Therefore, the neural network layer can include multiple hidden layers, and the parameters contained in the multiple hidden layers can be pre-trained based on relevant training data of a specific task type. For example, the task type can include image recognition, Image classification, image super-resolution reconstruction, etc.
- the output layer of the entire convolutional neural network is also included.
- This output layer has a loss function similar to categorical cross-entropy, specifically used to calculate the prediction error.
- Recurrent neural networks are used to process sequence data.
- the layers are fully connected, while the nodes within each layer are unconnected.
- this ordinary neural network has solved many difficult problems, it is still incompetent for many problems. For example, if you want to predict the next word of a sentence, you generally need to use the previous word, because the preceding and following words in a sentence are not independent. The reason why RNN is called a recurrent neural network is that the current output of a sequence is also related to the previous output.
- RNN can process sequence data of any length.
- the training of RNN is the same as the training of traditional CNN or DNN.
- the error backpropagation algorithm is also used, but there is one difference: that is, if the RNN is expanded into a network, then the parameters, such as W, are shared; this is not the case with the traditional neural network as shown in the example above.
- the output of each step not only depends on the network of the current step, but also depends on the status of the network of several previous steps. This learning algorithm is called Back propagation Through Time (BPTT).
- BPTT Back propagation Through Time
- the convolutional neural network can use the error back propagation (BP) algorithm to modify the size of the parameters in the initial super-resolution model during the training process, so that the reconstruction error loss of the super-resolution model becomes smaller and smaller.
- BP error back propagation
- forward propagation of the input signal until the output will produce an error loss
- the parameters in the initial super-resolution model are updated by back-propagating the error loss information, so that the error loss converges.
- the backpropagation algorithm is a backpropagation movement dominated by error loss, aiming to obtain the optimal parameters of the super-resolution model, such as the weight matrix.
- Generative adversarial networks is a deep learning model.
- the model includes at least two modules: one module is a generative model (Generative Model), and the other module is a discriminative model (Discriminative Model). Through these two modules, they learn from each other to produce better output.
- Both the generative model and the discriminative model can be neural networks, specifically deep neural networks or convolutional neural networks.
- the basic principle of GAN is as follows: Take the GAN that generates pictures as an example. Suppose there are two networks, G (Generator) and D (Discriminator), where G is a network that generates pictures.
- D is a discriminant network, used to judge whether a picture is "real". Its input parameter is x, x represents a picture, and the output D(x) represents the probability that x is a real picture. If it is 1, it means 130% is a real picture. If it is 0, it means it cannot be real. picture.
- the goal of the generative network G is to generate real pictures as much as possible to deceive the discriminant network D, and the goal of the discriminant network D is to try to distinguish the pictures generated by G from the real pictures. Come.
- G and D constitute a dynamic "game” process, that is, the "confrontation” in the "generative adversarial network".
- Figure 2 is an exemplary structural diagram of the system architecture of this application. As shown in Figure 2, the system can be applied to the scheduling and orchestration of data backup tasks in various scenarios, including VM, database, file and other backup scenarios. There is no specific limit on this.
- the system is divided into two parts: the front-end (client) and the back-end (backup system).
- the front-end client and the back-end backup system are connected through a network.
- the network connection includes but is not limited to Ethernet, Internet Protocol (IP) ) and other direct or multi-hop network connections, as well as data transmission forms such as Transmission Control Protocol (TCP) and Remote Directed Memory Access (RDMA).
- IP Internet Protocol
- TCP Transmission Control Protocol
- RDMA Remote Directed Memory Access
- the front-end client can include servers that support various operating systems (such as Windows, Unix, Linux, VMware, etc.) or various databases (such as Oracle, SQL, DB2, etc.).
- the client can also include This application does not specifically limit other business objects with data backup requirements.
- the back-end backup system can perform unified scheduling and orchestration for many backup tasks under different SLA constraints. Under the constraints of existing CPU and network transmission resources, the backup tasks to be transmitted can be scheduled and sorted and the transmission rate can be controlled to optimize system throughput. and the SLA time window compliance of each backup task.
- Figure 2 exemplarily shows the system architecture of the present application, but this system architecture does not constitute a limitation.
- the number, implementation form, implementation functions, etc. of the front-end clients in the system, as well as the construction method and deployment strategy of the back-end backup system, etc., can all be implemented using other solutions, and there are no specific restrictions on this.
- FIG. 3 is a flow chart of a process 300 of the data backup method according to the embodiment of the present application.
- Process 300 may be performed by both the front-end client and the back-end backup system.
- Process 300 is described as a series of steps or operations, and it should be understood that process 300 may be performed in various orders and/or occur simultaneously, and is not limited to the order of execution shown in FIG. 3 .
- Process 300 includes the following steps:
- Step 301 The client reports the first backup data volume and first task completion time of multiple business objects to the backup system.
- the multiple business objects of the client can include business objects of servers that support various operating systems (such as Windows, Unix, Linux, VMware, etc.), and can also include business objects of various databases (such as Oracle, SQL, DB2, etc.) , and may also include business objects of other devices or services with data backup requirements, which are not specifically limited in the embodiments of this application.
- a client can include multiple business objects. For example, multiple VMs are created on the server, and one VM is a business object.
- a database service is created for multiple users, and one database service is That is a business object.
- the first backup data amount and the first task completion time correspond to the last backup task of the multiple business objects.
- the data backup requirement of a business object is a long-term and recurring process. For example, a business object will regularly transfer production data from the client at a certain backup frequency (for example, hourly, daily, weekly, monthly, etc.) The data is backed up to the background backup system through the switching network, so the business object can regard each data transmission process as a backup task. It can be seen that the backup task of the business object is started and executed periodically. Based on this, in the embodiment of this application, the backup task to be executed in the next cycle is called the next backup task of the business object, and the previous backup task of the backup task is called the next backup task of the business object. The backup task that has been executed in a cycle is called the last backup task of the business object.
- the client can report the backup data volume and task completion time (JCT) of the backup task to Back up your system.
- JCT backup data volume and task completion time
- Step 302 The backup system obtains the scheduling order of the next backup tasks of multiple business objects based on the first backup data amount and the first task completion time.
- the backup system can use the following two methods to obtain the scheduling order of the next backup tasks of multiple business objects:
- the first machine learning model can be pre-trained, and the training process can be referred to the following embodiments.
- the input of the first machine learning model is the first backup data amount and the first task completion time of multiple business objects
- the output is the scheduling order of the next backup task for each of the multiple business objects.
- the business objects include VM1, VM2 and VM3.
- the first backup data volume and the first task completion time of the three business objects are input into the first machine learning model. After prediction by the first machine learning model, the three business objects are output.
- the scheduling sequence of the next backup task is VM3 ⁇ VM2 ⁇ VM1.
- scheduling order can be expressed in the order of business object identifiers (for example, VM ID), or It can be expressed in the order of the index (for example, Job ID) of the backup tasks of the business object.
- VM ID business object identifiers
- Job ID index of the backup tasks of the business object.
- the embodiment of the present application does not specifically limit the expression method of the scheduling order.
- the second backup data volume and the second task completion time are The time corresponds to the next backup task of multiple business objects; the scheduling order of the next backup task of multiple business objects is obtained based on the second backup data volume and the second task completion time.
- the second machine learning model can also be pre-trained, and its training process can also refer to the following embodiments.
- the input of the second machine learning model is the first backup data volume and the first task completion time of the plurality of business objects
- the output is the second backup data volume and the second task completion time of the plurality of business objects.
- the second backup data The amount and the completion time of the second task are the backup data amount and task completion time of the next backup task of the business object predicted by the second machine learning model. Since the next backup task has not yet been executed, the second backup data amount and the second task The completion time is not an actual value, but a predicted value.
- the backup system can calculate the data corresponding to the first business object.
- the ratio of the second backup data amount to the second task completion time is used to obtain the ease of the next backup task of the first business object. That is, the backup system needs to calculate the ease of the next backup task for each of the multiple business objects.
- the ease is the ratio of the second backup data amount and the second task completion time. The larger the ratio, the easier the next backup task is. , the smaller the ratio, the more difficult the next backup task is.
- the remaining start time of the next backup task of the first business object is obtained according to the completion time of the second task corresponding to the first business object.
- the backup system can calculate the difference between the latest task end time MaxEndTime (this information is reported to the backup system when the business object is registered or reconfigured), the second task completion time and the current time t to obtain the remaining startup time of the next backup task.
- the backup system After the backup system obtains the ease of the next backup task and the remaining startup time of the next backup task, it can calculate the ease of the next backup task of the first business object and the download times of multiple business objects. The ratio of the remaining startup time of a backup task is used to obtain the scheduling threshold of the next backup task of the first business object. Using the same calculation method, the backup system can obtain the scheduling threshold of the next backup task for all business objects. The larger the scheduling threshold, it means that the next backup task is easy to complete and time is tight, and needs to be started as soon as possible. The smaller the scheduling threshold, it means that the next backup task will be completed as soon as possible. If the task is difficult to complete and the time is flexible, the start can be postponed. Then, the threshold values of the next backup tasks of multiple business objects are sorted from large to small to obtain the scheduling order of the next backup tasks of multiple business objects.
- the above two methods use two machine learning models.
- the difference is that the output of the machine learning model is different, which is related to the training process of the machine learning model.
- the machine learning model can also output other information to assist in obtaining the information of multiple business objects.
- the scheduling order of the next backup task Therefore, the embodiments of this application do not limit the output of the machine learning model.
- the system throughput and task SLA time window compliance are fully considered during training, combined with the dynamic characteristics of the backup task (for example, the backup data volume of the backup task, task completion time, and the earliest start time of the task). , the latest task end time, etc.), the matching accuracy of the dynamic characteristics of the backup system and each backup task is improved. training, so that the prediction results of the machine learning model are closer to the actual execution results.
- methods other than the above-mentioned methods may also be used to obtain the scheduling order of the next backup tasks of multiple business objects, and there is no specific limitation on this.
- the backup system can first determine whether the next backup task of the first business object is cancelled; when the next backup task of the first business object is not cancelled, the backup system can determine whether the next backup task of the first business object is cancelled. Obtain the scheduling order of the next backup task for multiple business objects based on the ease of use and the remaining start time of the next backup task for multiple business objects.
- the backup system can determine whether the remaining startup time of the next backup task of the first business object is less than 0. When the remaining start time of the next backup task of the first business object is less than 0 (indicating that the latest end time of the next backup task of the first business object has passed), the cancellation of the next backup task of the first business object is calculated.
- Probability for example, calculate 1-p, p is a preset probability value, the larger 1-p is, the greater the probability of cancellation of the next backup task of the first business object, and the more likely it is to be canceled, the smaller 1-p means that the smaller the cancellation probability of the next backup task of the first business object, the more likely it is that it will not be canceled); when the cancellation probability of the next backup task of the first business object is greater than the preset threshold, it is determined to cancel the first business object.
- the next backup task when the cancellation probability of the next backup task of the first business object is less than or equal to the preset threshold, it is determined not to cancel the next backup task of the first business object.
- next backup task of the business object that needs to be canceled can be scheduled to avoid unnecessary scheduling caused by scheduling the next backup task of the business object that needs to be canceled, thus improving the next backup task of multiple business objects. scheduling efficiency.
- Step 303 The backup system sends scheduling instructions to the client according to the scheduling order of the next backup tasks of the multiple business objects.
- the backup system After the backup system obtains the scheduling order of the next backup task for multiple business objects, it can take the backup task at the head of the queue from the scheduling queue according to the number of parallel threads available in real time and the scheduling order as the backup task to be scheduled, and send it to the backup task.
- the client sends a scheduling instruction, which includes the identification of the backup task to be scheduled. For example, if the number of parallel threads available in real time is 2, then there can be 2 backup tasks that can currently be scheduled. Therefore, the backup system takes 2 backup tasks from the head of the scheduling queue (for example, Job IDs are 1 and 2) and sends them to the customer.
- Step 304 The client transmits the backup data of the next backup task of the corresponding business object to the backup system according to the scheduling instruction.
- the client After receiving the scheduling instruction from the backup system, the client can start the next backup task of the corresponding business object according to the identifier of the business object in the scheduling instruction, that is, transmit the next backup task of the business object indicated by the identifier of the business object to the backup system.
- the backup data of the backup task is not limited to the identifier of the business object in the scheduling instruction.
- the client after completing the next backup task of multiple business objects, the client returns to step 301 again and reports the backup data amount and task completion time of the backup task of multiple business objects to the backup system.
- the client can choose to send the backup data amount and task completion time of each business object backup task.
- the client can also choose to send the backup data of multiple business objects at once after completing the backup task of multiple business objects. There are no specific restrictions on the backup data volume and task completion time of the backup task.
- the backup data of the last backup task of multiple business objects is By predicting the volume and task completion time, we can obtain the scheduling sequence of the next backup tasks for multiple business objects. This can fully consider the matching accuracy of the backup system and the dynamic characteristics of each backup task. It can also avoid manual intervention in the scheduling of multiple backup tasks and improve task performance. SLA time window compliance.
- FIG. 4 is a flow chart of a process 400 of the data backup method according to the embodiment of the present application.
- Process 400 may be performed by both the front-end client and the back-end backup system.
- Process 400 is described as a series of steps or operations, and it should be understood that process 400 may be performed in various orders and/or occur simultaneously and is not limited to the order of execution shown in FIG. 4 .
- Process 400 includes the following steps:
- Step 401 The client sends the sending rate and receiving rate of the backup task of the second business object in the previous cycle to the backup system.
- the backup task of the second business object is being executed, that is, in the embodiment of the present application, the speed limit can be implemented for the backup task that is being executed (the client is sending the backup data of the backup task to the backup system), so that the port bandwidth Resources can be optimally used based on the real-time transmission characteristics of tasks, improving system throughput.
- the backup task of business objects is executed periodically.
- the backup task being executed can be considered to be in the current cycle (can also be called the next cycle), then the cycle in which the backup task was last executed can be considered Is the previous cycle of this backup task.
- a rate reporting cycle can be set, and the client reports the second service in the previous reporting cycle to the backup system according to this reporting cycle.
- the sending rate and receiving rate are regarded as the average sending rate and average receiving rate in the previous reporting period, or they can also be regarded as the weighted value in the previous reporting period.
- the average sending rate and the weighted average receiving rate are not specifically limited.
- Step 402 The backup system obtains the rate limit rate of the backup task of the second business object in the next cycle based on the sending rate and the receiving rate.
- the backup system can input the sending rate and the receiving rate into the third machine learning model to obtain the receiving rate of the backup task of the second business object in the next cycle; and then obtain the receiving rate of the backup task of the second business object in the next cycle. 2. The rate limit of the backup task of the business object in the next cycle.
- the backup system can obtain the preset bandwidth of the first port, which is used to perform the backup task of the second business object; obtain the sum of the reception rates of all backup tasks transmitted by the first port in the next cycle; according to the first port
- the rate limit rate of the backup task of the second business object in the next cycle is obtained by the sum of the preset bandwidth, the reception rate of the backup task of the second business object in the next cycle, and the reception rate.
- the predicted reception rate of backup task i is the reception rate of the backup task of the second business object in the next cycle
- the sum of the predicted reception rates of all backup tasks transmitted on the first port is the sum of the predicted reception rates of all backup tasks transmitted on the first port. The sum of the receive rates in the next cycle.
- the embodiment of the present application can also use other methods to calculate the rate limit rate of the backup task of the second business object in the next cycle, which is not specifically limited.
- Step 403 The backup system sends a rate limit instruction to the client, where the rate limit instruction includes the rate limit rate of the backup task of the second business object in the next cycle.
- the backup system sends the rate limit rate of the backup task of the second business object in the next cycle through a rate limit indication. to the client, so that the client can limit the sending rate of the backup data of the backup task of transmitting the second business object based on the rate limit rate.
- Step 404 The client controls the rate limit rate of the backup task of the second business object in the next cycle according to the rate limit indication.
- the client can limit the sending rate of the backup data of the backup task that transmits the second business object based on the rate limit, so that the port bandwidth resources can be optimally used based on the real-time transmission characteristics of the task and improve the throughput of the system.
- the rate limit rate of the backup tasks of multiple business objects in the next cycle is predicted based on the sending rate and receiving rate of the backup tasks of multiple business objects in the previous cycle, which can make the port Bandwidth resources can be optimally used based on the real-time transmission characteristics of tasks, improving system throughput. It can also avoid manual intervention in scheduling multiple backup tasks and improve system transmission efficiency.
- FIG. 5 is a flow chart of a process 500 of the data backup method according to the embodiment of the present application.
- Process 500 may be performed by a backend backup system.
- Process 500 is described as a series of steps or operations, and it should be understood that process 500 may be performed in various orders and/or occur simultaneously and is not limited to the order of execution shown in FIG. 5 .
- Process 500 includes the following steps:
- Step 501 Obtain training data.
- Training data is data used to train machine learning models.
- the training data can be different depending on the structure, parameters, functions, etc. of the machine learning model.
- Step 502 Train and obtain the target machine learning model based on the training data.
- the target machine learning model includes at least one of a first machine learning model, a second machine learning model, and a third machine learning model, where the first machine learning model is used to predict the next backup of the business object.
- the scheduling sequence of tasks, the second machine learning model is used to predict the second backup data volume and the second task completion time of the business object, and the third machine learning model is used to predict the reception rate of the backup task of the business object in the next cycle.
- the backup system can obtain the historical backup data volume and historical task completion time of multiple business objects.
- the historical backup data volume and historical task completion time correspond to the completed backup tasks of multiple business objects; obtain the preset machine learning Model; input the historical backup data volume and historical task completion time of multiple business objects into the preset machine learning model to obtain the predicted backup data volume and predicted task completion time of multiple business objects; based on the predicted backup data volume and predicted task completion Time for convergence training to obtain the target machine learning model.
- the backup system can obtain the historical receiving rate and historical sending rate of completed backup tasks of multiple business objects; obtain the preset machine learning model; input the historical receiving rate and historical sending rate into the preset machine learning model to Obtain the predicted reception rate of backup tasks for multiple business objects; perform convergence training based on the predicted reception rate to obtain the target machine learning model.
- the preset machine learning model may also be different, including the structure and parameters of the preset machine learning model. This is not the case in the embodiments of the present application. Make specific limitations.
- the machine learning model can also use the error back propagation (BP) algorithm to correct the size of the parameters in the initial model during the training process, so that the reconstruction error loss of the model becomes smaller and smaller. Specifically, forward propagation of the input signal until the output will produce an error loss, and backward propagation of the error loss information is used to update the parameters in the initial model, so that the error loss converges.
- the backpropagation algorithm is a backpropagation movement dominated by error loss, aiming to obtain the optimal parameters of the machine learning model, such as the weight matrix.
- the backup system can be further divided into three modules: scheduling and sorting module, speed limit decision-making module, and machine learning model training module.
- the machine learning model training module is responsible for cyclical training of the machine learning model based on historical data.
- the scheduling and sorting module is responsible for taking the execution result of the last backup task as input, calling the machine learning model, and outputting the data to be The scheduling sequence of the scheduled backup tasks, and instructs the client to execute the corresponding backup tasks according to the scheduling sequence;
- the rate limit decision-making module is responsible for calling the machine learning model for each started backup task, using the real-time feedback sending rate and receiving rate as input. , output the dynamic rate limit of the task, and send the rate limit to the client.
- FIG 6 is an exemplary schematic diagram of the client configuration and registration process.
- the client configuration and registration process is divided into two steps: 1 System initialization (for example, business object changes, configuration changes, system power-on etc.), the client sends the model update frequency UpdateFreq to configure the model training cycle of the machine learning model training module; 2 The client sends the basic configuration information of each business object (such as VM) to the scheduling and sorting module, and the information includes the business object Identity (VM Id), earliest task start time (MinStartTime), latest task end time (MaxEndTime), and backup task scheduling cycle (JobFreq).
- VM Id business object Identity
- MinStartTime latest task start time
- MaxEndTime latest task end time
- JobFreq backup task scheduling cycle
- This application only requires the client to configure the backup task scheduling cycle JobFreq, the earliest task start time MinStartTime (for example, 8:00PM after get off work), and the latest task end time MaxEndTime (for example, 8AM before work the next day).
- the scheduling and sorting module will flexibly schedule backup tasks within the backup task scheduling cycle based on the actual execution of each backup task, and optimize the overall scheduling strategy to ensure that as many backup tasks as possible are within the constrained maximum allowed time window [MinStartTime, MaxEndTime] Complete backup.
- the basic configuration information in step 2 in this application may not include the earliest task start time MinStartTime and the latest task end time MaxEndTime, that is, for the backup task identified by the VM ID, the maximum allowed time window for its execution is not limited.
- the scheduling and sorting module can select the most suitable time period to schedule the backup task within this cycle (for example, within a day, within a week, within 12 hours, etc.).
- FIG 7 is an exemplary schematic diagram of the scheduling and sorting calculation process.
- the scheduling and sorting module first obtains the MinStartTime of the backup task that has not been started, and determines whether MinStartTime ⁇ t is established.
- t represents the current time. For example, the current time is 9:00PM. If the above inequality is true, it means that the earliest start time of the task set by the backup task is earlier than the current time and can be started; if the above inequality does not hold, it means that the earliest start time of the task set by the backup task is The time is later than the current time, that is, it has not yet reached the start time of the backup task and cannot be started.
- the scheduling and sorting module predicts the backup data volume volume and task completion time predicted JCT of the next backup of the backup task through the trained machine learning model, and calculates the ratio of volume/predicted JCT as the next backup task The ease of a backup, and then calculate the result of MaxEndTime-predicted JCT–t as the remaining startup time of the next backup of the backup task, which can be used as the urgency of the next backup of the backup task.
- MaxEndTime-predicted JCT–t ⁇ 0 is true. If the above inequality is not true, it means that the remaining startup time of the next backup of the backup task is a positive number, that is, there is still some time before the next backup of the backup task, that is, the current The time plus the predicted task completion time of the next backup of the backup task has not yet reached the latest end time of the backup task. At this time, the backup task can be scheduled; if the above inequality is true, it means that the next backup time of the backup task The remaining start time of the backup has become a negative number, that is, the next backup of the backup task has missed the start time, that is, the current time plus the predicted task completion time of the next backup of the backup task has passed the backup task The latest end time.
- next backup of the backup task will be canceled with a probability of 1-p.
- the probability p can be pre-configured during system initialization. The larger the value of p, the next backup of the backup task will be cancelled. The smaller the possibility, the smaller the value of p, indicating that the next backup of the backup task is more likely to be cancelled.
- the backup tasks are sorted from large to small according to the above ratios of each backup task, and the scheduling queue for the next backup of all backup tasks to be scheduled can be obtained.
- the greater the value of urgency indicating that the backup task is not very urgent
- the smaller the calculated ratio is, the lower the scheduling ranking is, and the smaller the value of urgency (indicating that the backup task is more urgent).
- the greater the calculated ratio the higher the scheduling ranking.
- the ease value indicating that it is easier to complete
- the larger the calculated ratio is the higher the scheduling ranking is.
- the smaller the value of ease (indicating that it is more complicated to complete) the smaller the calculated ratio is, and the lower the scheduling ranking is.
- Figure 8 is an exemplary schematic diagram of a real-time scheduling interaction process. As shown in Figure 8, the scheduling and sorting module uses the method shown in Figure 7 to obtain the scheduling order of the next backup of multiple backup tasks to be scheduled.
- the scheduling and sorting module takes the backup task at the head of the queue from the scheduling queue in order according to the number of parallel threads available in real time and as the backup task to be scheduled, and sends a scheduling instruction to the client.
- the scheduling instruction includes the backup task to be scheduled.
- the scheduling and sorting module After starting the backup task, the scheduling and sorting module sends a speed limit start instruction to the speed limit decision-making module.
- the speed limit start instruction includes the identification of the started backup task (Job Id) and the time (createtime) when the scheduling and sorting module actually starts the task. (Step 2).
- the speed limit decision-making module After receiving the speed limit start instruction, the speed limit decision-making module obtains the speed limit rate of the started backup task in the next cycle, and then sends the speed limit instruction to the client.
- the speed limit instruction includes the identifier of the started backup task (Job Id). ) and the rate limit (Ratelimit) (step 3), so that the client performs source-side sending rate limit based on the rate limit rate.
- the client When the client completes a backup task, it can send task completion measurement feedback to the machine learning model training module.
- the task completion measurement feedback includes the identification of the completed backup task (Job Id), the number of times the backup task was backed up in the past backup Data volume (volume) and task completion time (JCT) of the backup task in the past backup (step 4). This feedback allows the machine learning model training module to complete updated training of the machine learning model based on the feedback information.
- FIG 9 is an exemplary schematic diagram of the calculation process of the rate limit.
- the client performs periodic (for example, every 10ms, every 1s, every minute, every 5 minutes, etc.) (This is not limited), the actual sending and receiving rate measurement feedback is sent to the speed limit decision-making module.
- the actual sending and receiving rate measurement feedback includes the identification (Job Id), timestamp (timestamp), receiving rate (RxRate) and sending rate of the started backup task. (TxRate)(step 1).
- RxRate can be the average receiving rate in the previous cycle
- TxRate can be the average sending rate in the previous cycle.
- the rate limit decision-making module calls the machine learning model, inputs RxRate and TxRate, and outputs the predicted reception rate of the started task in the next cycle.
- the speed limit decision-making module calculates the speed limit rate of the backup task in the next cycle according to the following formula:
- the first port is the transmission port of backup task i.
- the speed limit decision-making module After obtaining the speed limit rate of backup task i, the speed limit decision-making module sends a speed limit instruction to the client.
- the speed limit instruction includes the ID of the started backup task (Job Id) and the speed limit rate (Ratelimit) (step 2). This allows the client to perform source-end sending rate limiting based on the rate limiting rate.
- Figure 10 is an exemplary schematic diagram of the training process of the machine learning model.
- the machine learning model training module can be performed periodically (for example, every week, every month, every quarter, etc., there is no limit to this).
- Update the parameters of the machine learning model which may include the structure of the machine learning model, parameters of one or more layers contained in the machine learning model, etc.
- the machine learning model training module sends model parameter update instructions to the scheduling and sorting module and the speed limit decision-making module respectively.
- the model parameter update instructions include the updated model parameters (ModelParas) (step 1), so that the scheduling and sorting The module and the rate limit decision module can update the local machine learning model based on the updated model parameters.
- the client After entering the next cycle, the client periodically (for example, every 10ms, every 1s, every minute, every 5 minutes, etc., there is no limit to this) sends the actual sending and receiving rate to the speed limit decision-making module for the started backup task.
- Measurement feedback the actual transmission and reception rate measurement feedback includes the identification (Job Id), timestamp (timestamp), reception rate (RxRate) and transmission rate (TxRate) of the started backup task (step 2).
- RxRate can be the average receiving rate in the previous cycle
- TxRate can be the average sending rate in the previous cycle.
- the client When the client completes a backup task, it can send task completion measurement feedback to the machine learning model training module.
- the task completion measurement feedback includes the identification of the completed backup task (Job Id), the number of times the backup task was backed up in the past backup Data volume (volume) and task completion time (JCT) of the backup task in the past backup (step 3). This feedback allows the machine learning model training module to complete updated training of the machine learning model based on the feedback information.
- the machine learning model training module can perform training based on the structure of the machine learning model shown in Figure 11.
- the machine learning model for example, Flow Neural Network (FlowNN), convolutional network, deep network, etc.
- the machine learning model is a multi-layer neural network model, which mainly consists of the input embedding layer L1 and the path aggregator layer (PathAggregator).
- L1 mainly implements mapping of each input feature data into a high-dimensional space
- L2 aggregates all the time information output by L1 and recurses along the time dimension
- L3 predicts the real-time reception rate of the task based on the output of L1 and L2.
- Figure 11 comes from X.Cheng et al., "Physics constrained flow neural network for short-timescale predictions in data communications networks", ArXiv, https://arxiv.org/pdf/2112.12321.pdf.
- the machine learning model training module inputs the backup data volume of backup tasks historically executed by each business object and the task completion time JCT of the backup task into L1. After L2 aggregates all the input information, the output of L2 is finally used directly. Fully connected network mapping predicts backup data volumes and predicts task completion times for multiple business objects. In this process, only L1 and L2 participate in the machine learning model.
- the machine learning model training module compares the predicted backup data volume and predicted task completion time of multiple business objects with the actual backup data volume and actual task completion time of this backup task for multiple business objects. Based on the difference between the two The loss converges the machine learning model, thereby obtaining the parameters of the updated machine learning model.
- the machine learning model training module inputs the receiving rate (RxRate) and sending rate (TxRate) of each business object in the previous cycle into L1, and directly uses the outputs of L2 and L3 after calculation by L1, L2, and L3.
- the fully connected network maps the predicted reception rate of the backup tasks of multiple business objects in the next cycle.
- the machine learning model training module compares the aforementioned predicted reception rate with the actual reception rate of this backup task for multiple business objects, and converges the machine learning model based on the loss between the two, thereby obtaining the updated machine learning model. parameters.
- this application can also use other collected relevant task transmission characteristics and system status characteristic data as input to train the machine learning model, without specific limitations.
- the machine learning model training module again sends model parameter update instructions to the scheduling and sorting module and the speed limit decision-making module respectively.
- the model parameter update instructions include the updated model parameters (ModelParas) (step 4), so that the scheduling The sorting module and the rate limiting decision module can update the local machine learning model based on the updated model parameters.
- the client can perform the above steps 2 and 3 multiple times, so that the input data used by the machine learning model training module can be the customer data received within a training cycle. Multiple feedbacks from the end carry data.
- FIG 12 is an exemplary structural diagram of a data backup device 1200 according to an embodiment of the present application. As shown in Figure 12, the data backup device 1200 according to this embodiment can be applied to a backup system.
- the data backup device 1200 may include: an acquisition module 1201, a scheduling module 1202, a sending module 1203, a rate limiting module 1204 and a training module 1205. in,
- Obtaining module 1201 is used to obtain the first backup data volume and the first task completion time of multiple business objects, and the first backup data volume and the first task completion time are consistent with the last backup of the multiple business objects. task correspondence; adjustment The degree module 1202 is used to obtain the scheduling order of the next backup task of the multiple business objects according to the first backup data amount and the first task completion time; the sending module 1203 is used to obtain the scheduling order of the next backup task according to the multiple business objects.
- the scheduling sequence of the object's next backup task sends a scheduling instruction to the client, where the scheduling instruction includes the identification of the business object corresponding to the backup task to be scheduled.
- the scheduling module 1202 is specifically configured to input the first backup data amount and the first task completion time into a first machine learning model to obtain the next steps of the multiple business objects.
- the scheduling sequence of a backup task is specifically configured to input the first backup data amount and the first task completion time into a first machine learning model to obtain the next steps of the multiple business objects.
- the scheduling module 1202 is specifically configured to input the first backup data amount and the first task completion time into a second machine learning model to obtain the first data of the multiple business objects. Second, the backup data amount and the second task completion time, the second backup data amount and the second task completion time correspond to the next backup task of the multiple business objects; according to the second backup data amount and the second task completion time, The second task completion time is used to obtain the scheduling order of the next backup tasks of the multiple business objects.
- the scheduling module 1202 is specifically configured to calculate the ratio of the second backup data amount corresponding to the first business object and the second task completion time to obtain the first business object.
- the ease of the next backup task the first business object is any one of the plurality of business objects; the first business object is obtained according to the second task completion time corresponding to the first business object The remaining startup time of the next backup task; when the ease of the next backup task of the multiple business objects and the remaining startup time of the next backup task of the multiple business objects are both obtained, according to the multiple The ease of the next backup task of the business object and the remaining start time of the next backup task of the multiple business objects are used to obtain the scheduling order of the next backup task of the multiple business objects.
- the scheduling module 1202 is also used to determine whether the next backup task of the first business object is canceled; when the next backup task of the first business object is not canceled, according to The ease of the next backup task of the multiple business objects and the remaining start time of the next backup task of the multiple business objects are used to obtain the scheduling order of the next backup task of the multiple business objects.
- the scheduling module 1202 is specifically configured to calculate the ease of the next backup task of the multiple business objects and the remaining startup time of the next backup task of the multiple business objects. ratio to obtain the scheduling threshold of the next backup task of the multiple business objects; sort the thresholds of the next backup task of the multiple business objects in order from large to small to obtain the next scheduling threshold of the multiple business objects.
- the scheduling sequence of a backup task is specifically configured to calculate the ease of the next backup task of the multiple business objects and the remaining startup time of the next backup task of the multiple business objects. ratio to obtain the scheduling threshold of the next backup task of the multiple business objects; sort the thresholds of the next backup task of the multiple business objects in order from large to small to obtain the next scheduling threshold of the multiple business objects.
- the scheduling module 1202 is specifically used to determine whether the remaining startup time of the next backup task of the first business object is less than 0; when the next backup task of the first business object When the remaining startup time is less than 0, calculate the cancellation probability of the next backup task of the first business object; when the cancellation probability of the next backup task of the first business object is greater than the preset threshold, determine to cancel the first backup task.
- the next backup task of a business object is specifically used to determine whether the remaining startup time of the next backup task of the first business object is less than 0; when the next backup task of the first business object When the remaining startup time is less than 0, calculate the cancellation probability of the next backup task of the first business object; when the cancellation probability of the next backup task of the first business object is greater than the preset threshold, determine to cancel the first backup task.
- the next backup task of a business object is specifically used to determine whether the remaining startup time of the next backup task of the first business object is less than 0; when the next backup task of the first business object When the remaining startup time is less than 0, calculate the
- the scheduling module 1202 is also configured to determine not to cancel the first business object when the remaining startup time of the next backup task of the first business object is greater than or equal to 0. Next backup task.
- the scheduling module 1202 is also configured to determine not to cancel the first business object when the cancellation probability of the next backup task of the first business object is less than or equal to a preset threshold. next backup of Task.
- the rate limiting module 1204 is configured to receive the sending rate and receiving rate of the backup task of the second business object sent by the client in the previous cycle. The task is being executed; the rate limit rate of the backup task of the second business object in the next cycle is obtained according to the sending rate and the receiving rate; the sending module 1203 is also used to send the rate limit to the client.
- the speed limit indication includes the speed limit rate of the backup task of the second business object in the next cycle.
- the rate limiting module 1204 is specifically configured to input the sending rate and the receiving rate into a third machine learning model to obtain the backup task of the second business object in the next cycle. the receiving rate; obtain the rate limit rate of the backup task of the second business object in the next cycle according to the receiving rate of the backup task of the second business object in the next cycle.
- the rate limiting module 1204 is specifically used to obtain the preset bandwidth of the first port, which is used to perform the backup task of the second business object; obtain the third The sum of the reception rates of all backup tasks transmitted by a port in the next cycle; according to the preset bandwidth of the first port, the reception rate of the backup tasks of the second business object in the next cycle, and the sum of the reception rates and obtain the rate limit rate of the backup task of the second business object in the next cycle.
- the training module 1205 is used to train to obtain a target machine learning model, where the target machine learning model includes at least one of a first machine learning model, a second machine learning model, and a third machine learning model. 1.
- the first machine learning model is used to predict the scheduling sequence of the next backup task of the business object
- the second machine learning model is used to predict the second backup data volume and the second task completion time of the business object.
- the third machine learning model is used to predict the reception rate of the backup task of the business object in the next cycle.
- the training module 1205 is specifically used to obtain the historical backup data volume and historical task completion time of the multiple business objects.
- the historical backup data volume and the historical task completion time are sum of The completed backup tasks of the multiple business objects correspond to each other; a preset machine learning model is obtained; and the historical backup data volume and historical task completion time of the multiple business objects are input into the preset machine learning model to obtain the preset machine learning model.
- the predicted backup data volume and predicted task completion time of the multiple business objects are calculated; convergence training is performed based on the predicted backup data volume and the predicted task completion time to obtain the target machine learning model.
- the training module 1205 is specifically configured to obtain the historical receiving rate and historical sending rate of completed backup tasks of the multiple business objects; obtain a preset machine learning model; and convert the The historical receiving rate and historical sending rate are input into the preset machine learning model to obtain the predicted receiving rate of the backup tasks of the multiple business objects; convergence training is performed based on the predicted receiving rate to obtain the target machine learning model.
- the device of this embodiment can be used to execute the technical solution of any of the method embodiments shown in Figures 3 and 4. Its implementation principles and technical effects are similar and will not be described again here.
- each step of the above method embodiment can be completed through an integrated logic circuit of hardware in the processor or instructions in the form of software.
- the processor may be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA), or other Programmed logic devices, discrete gate or transistor logic devices, discrete hardware components.
- DSP digital signal processor
- ASIC application-specific integrated circuit
- FPGA field programmable gate array
- a general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.
- the steps of the methods disclosed in the embodiments of the present application can be directly implemented by a hardware encoding processor, or executed by a combination of hardware and software modules in the encoding processor.
- Software modules can be located in random access memory, flash memory, only Read memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field.
- the storage medium is located in the memory, and the processor reads the information in the memory and completes the steps of the above method in combination with its hardware.
- non-volatile memory may be read-only memory (ROM), programmable ROM (PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically removable memory. Erase electrically programmable read-only memory (EPROM, EEPROM) or flash memory. Volatile memory can be random access memory (RAM), which is used as an external cache.
- RAM random access memory
- RAM static random access memory
- DRAM dynamic random access memory
- SDRAM synchronous dynamic random access memory
- double data rate SDRAM double data rate SDRAM
- DDR SDRAM double data rate SDRAM
- ESDRAM enhanced synchronous dynamic random access memory
- SLDRAM synchronous link dynamic random access memory
- direct rambus RAM direct rambus RAM
- the disclosed systems, devices and methods can be implemented in other ways.
- the device embodiments described above are only illustrative.
- the division of the units is only a logical function division. In actual implementation, there may be other division methods.
- multiple units or components may be combined or can be integrated into another system, or some features can be ignored, or not implemented.
- the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
- the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
- each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit.
- the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium.
- the technical solution of the present application is essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product.
- the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application.
- the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Quality & Reliability (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (33)
- 一种数据备份方法,其特征在于,包括:获取多个业务对象的第一备份数据量和第一任务完成时间,所述第一备份数据量和所述第一任务完成时间与所述多个业务对象的上一次备份任务对应;根据所述第一备份数据量和所述第一任务完成时间获取所述多个业务对象的下一次备份任务的调度顺序;根据所述多个业务对象的下一次备份任务的调度顺序向客户端发送调度指示,所述调度指示包括即将调度的备份任务对应的业务对象的标识。
- 根据权利要求1所述的方法,其特征在于,所述根据所述第一备份数据量和所述第一任务完成时间获取所述多个业务对象的下一次备份任务的调度顺序,包括:将所述第一备份数据量和所述第一任务完成时间输入第一机器学习模型以得到所述多个业务对象的下一次备份任务的调度顺序。
- 根据权利要求1或2所述的方法,其特征在于,所述根据所述第一备份数据量和所述第一任务完成时间获取所述多个业务对象的下一次备份任务的调度顺序,包括:将所述第一备份数据量和所述第一任务完成时间输入第二机器学习模型以得到所述多个业务对象的第二备份数据量和第二任务完成时间,所述第二备份数据量和所述第二任务完成时间与所述多个业务对象的下一次备份任务对应;根据所述第二备份数据量和所述第二任务完成时间获取所述多个业务对象的下一次备份任务的调度顺序。
- 根据权利要求3所述的方法,其特征在于,所述根据所述第二备份数据量和所述第二任务完成时间获取所述多个业务对象的下一次备份任务的调度顺序,包括:计算第一业务对象对应的所述第二备份数据量和所述第二任务完成时间的比值以得到所述第一业务对象的下一次备份任务的容易度,所述第一业务对象是所述多个业务对象中的任意一个;根据所述第一业务对象对应的所述第二任务完成时间获取所述第一业务对象的下一次备份任务的剩余启动时间;当所述多个业务对象的下一次备份任务的容易度和所述多个业务对象的下一次备份任务的剩余启动时间均得到后,根据所述多个业务对象的下一次备份任务的容易度和所述多个业务对象的下一次备份任务的剩余启动时间获取所述多个业务对象的下一次备份任务的调度顺序。
- 根据权利要求4所述的方法,其特征在于,所述根据所述多个业务对象的下一次备份任务的容易度和所述多个业务对象的下一次备份任务的剩余启动时间获取所述多个业务对象的下一次备份任务的调度顺序之前,还包括:判断所述第一业务对象的下一次备份任务是否取消;所述根据所述多个业务对象的下一次备份任务的容易度和所述多个业务对象的下一次备份任务的剩余启动时间获取所述多个业务对象的下一次备份任务的调度顺序,包括:当所述第一业务对象的下一次备份任务不取消时,根据所述多个业务对象的下一次备份任务的容易度和所述多个业务对象的下一次备份任务的剩余启动时间获取所述多个业 务对象的下一次备份任务的调度顺序。
- 根据权利要求4或5所述的方法,其特征在于,所述根据所述多个业务对象的下一次备份任务的容易度和所述多个业务对象的下一次备份任务的剩余启动时间获取所述多个业务对象的下一次备份任务的调度顺序,包括:计算所述多个业务对象的下一次备份任务的容易度和所述多个业务对象的下一次备份任务的剩余启动时间的比值以得到所述多个业务对象的下一次备份任务的调度阈值;将所述多个业务对象的下一次备份任务的阈值按照从大到小的顺序排序以得到所述多个业务对象的下一次备份任务的调度顺序。
- 根据权利要求5所述的方法,其特征在于,所述判断所述第一业务对象的下一次备份任务是否取消,包括:判断所述第一业务对象的下一次备份任务的剩余启动时间是否小于0;当所述第一业务对象的下一次备份任务的剩余启动时间小于0时,计算所述第一业务对象的下一次备份任务的取消概率;当所述第一业务对象的下一次备份任务的取消概率大于预设阈值时,确定取消所述第一业务对象的下一次备份任务。
- 根据权利要求7所述的方法,其特征在于,所述判断所述第一业务对象的下一次备份任务的剩余启动时间是否小于0之后,还包括:当所述第一业务对象的下一次备份任务的剩余启动时间大于或等于0时,确定不取消所述第一业务对象的下一次备份任务。
- 根据权利要求7所述的方法,其特征在于,所述计算所述第一业务对象的下一次备份任务的取消概率之后,还包括:当所述第一业务对象的下一次备份任务的取消概率小于或等于预设阈值时,确定不取消所述第一业务对象的下一次备份任务。
- 根据权利要求1-9中任一项所述的方法,其特征在于,所述根据所述多个业务对象的下一次备份任务的调度顺序向客户端发送调度指示之后,还包括:接收所述客户端发送的第二业务对象的备份任务在上一个周期内的发送速率和接收速率,所述第二业务对象的备份任务正在执行中;根据所述发送速率和所述接收速率获取所述第二业务对象的备份任务在下一个周期内的限速速率;向所述客户端发送限速指示,所述限速指示包括所述第二业务对象的备份任务在下一个周期内的限速速率。
- 根据权利要求10所述的方法,其特征在于,所述根据所述发送速率和所述接收速率获取所述第二业务对象的备份任务在下一个周期内的限速速率,包括:将所述发送速率和所述接收速率输入第三机器学习模型以得到所述第二业务对象的备份任务在下一个周期内的接收速率;根据所述第二业务对象的备份任务在下一个周期内的接收速率获取所述第二业务对象的备份任务在下一个周期内的限速速率。
- 根据权利要求11所述的方法,其特征在于,所述根据所述第二业务对象的备份任务在下一个周期内的接收速率获取所述第二业务对象的备份任务在下一个周期内的限速 速率,包括:获取第一端口的预设带宽,所述第一端口用于执行所述第二业务对象的备份任务;获取所述第一端口传输的所有备份任务在下一个周期内的接收速率之和;根据所述第一端口的预设带宽、所述第二业务对象的备份任务在下一个周期内的接收速率以及所述接收速率之和获取所述第二业务对象的备份任务在下一个周期内的限速速率。
- 根据权利要求1-12中任一项所述的方法,其特征在于,还包括:训练得到目标机器学习模型,所述目标机器学习模型包括第一机器学习模型、第二机器学习模型和第三机器学习模型中的至少之一,所述第一机器学习模型用于预测业务对象的下一次备份任务的调度顺序,所述第二机器学习模型用于预测业务对象的第二备份数据量和第二任务完成时间,所述第三机器学习模型用于预测业务对象的备份任务在下一个周期内的接收速率。
- 根据权利要求13所述的方法,其特征在于,所述训练得到目标机器学习模型,包括:获取所述多个业务对象的历史备份数据量和历史任务完成时间,所述历史备份数据量和所述历史任务完成时间和所述多个业务对象的已完成备份任务对应;获取预设的机器学习模型;将所述多个业务对象的历史备份数据量和历史任务完成时间输入所述预设的机器学习模型以得到所述多个业务对象的预测备份数据量和预测任务完成时间;基于所述预测备份数据量和所述预测任务完成时间进行收敛训练以得到所述目标机器学习模型。
- 根据权利要求13或14所述的方法,其特征在于,所述训练得到目标机器学习模型,包括:获取所述多个业务对象的已完成备份任务的历史接收速率和历史发送速率;获取预设的机器学习模型;将所述历史接收速率和历史发送速率输入所述预设的机器学习模型以得到所述多个业务对象的备份任务的预测接收速率;基于所述预测接收速率进行收敛训练以得到所述目标机器学习模型。
- 一种数据备份装置,其特征在于,包括:获取模块,用于获取多个业务对象的第一备份数据量和第一任务完成时间,所述第一备份数据量和所述第一任务完成时间与所述多个业务对象的上一次备份任务对应;调度模块,用于根据所述第一备份数据量和所述第一任务完成时间获取所述多个业务对象的下一次备份任务的调度顺序;发送模块,用于根据所述多个业务对象的下一次备份任务的调度顺序向客户端发送调度指示,所述调度指示包括即将调度的备份任务对应的业务对象的标识。
- 根据权利要求16所述的装置,其特征在于,所述调度模块,具体用于将所述第一备份数据量和所述第一任务完成时间输入第一机器学习模型以得到所述多个业务对象的下一次备份任务的调度顺序。
- 根据权利要求16或17所述的装置,其特征在于,所述调度模块,具体用于将所 述第一备份数据量和所述第一任务完成时间输入第二机器学习模型以得到所述多个业务对象的第二备份数据量和第二任务完成时间,所述第二备份数据量和所述第二任务完成时间与所述多个业务对象的下一次备份任务对应;根据所述第二备份数据量和所述第二任务完成时间获取所述多个业务对象的下一次备份任务的调度顺序。
- 根据权利要求18所述的装置,其特征在于,所述调度模块,具体用于计算第一业务对象对应的所述第二备份数据量和所述第二任务完成时间的比值以得到所述第一业务对象的下一次备份任务的容易度,所述第一业务对象是所述多个业务对象中的任意一个;根据所述第一业务对象对应的所述第二任务完成时间获取所述第一业务对象的下一次备份任务的剩余启动时间;当所述多个业务对象的下一次备份任务的容易度和所述多个业务对象的下一次备份任务的剩余启动时间均得到后,根据所述多个业务对象的下一次备份任务的容易度和所述多个业务对象的下一次备份任务的剩余启动时间获取所述多个业务对象的下一次备份任务的调度顺序。
- 根据权利要求19所述的装置,其特征在于,所述调度模块,还用于判断所述第一业务对象的下一次备份任务是否取消;当所述第一业务对象的下一次备份任务不取消时,根据所述多个业务对象的下一次备份任务的容易度和所述多个业务对象的下一次备份任务的剩余启动时间获取所述多个业务对象的下一次备份任务的调度顺序。
- 根据权利要求19或20所述的装置,其特征在于,所述调度模块,具体用于计算所述多个业务对象的下一次备份任务的容易度和所述多个业务对象的下一次备份任务的剩余启动时间的比值以得到所述多个业务对象的下一次备份任务的调度阈值;将所述多个业务对象的下一次备份任务的阈值按照从大到小的顺序排序以得到所述多个业务对象的下一次备份任务的调度顺序。
- 根据权利要求20所述的装置,其特征在于,所述调度模块,具体用于判断所述第一业务对象的下一次备份任务的剩余启动时间是否小于0;当所述第一业务对象的下一次备份任务的剩余启动时间小于0时,计算所述第一业务对象的下一次备份任务的取消概率;当所述第一业务对象的下一次备份任务的取消概率大于预设阈值时,确定取消所述第一业务对象的下一次备份任务。
- 根据权利要求22所述的装置,其特征在于,所述调度模块,还用于当所述第一业务对象的下一次备份任务的剩余启动时间大于或等于0时,确定不取消所述第一业务对象的下一次备份任务。
- 根据权利要求22所述的装置,其特征在于,所述调度模块,还用于当所述第一业务对象的下一次备份任务的取消概率小于或等于预设阈值时,确定不取消所述第一业务对象的下一次备份任务。
- 根据权利要求16-24中任一项所述的装置,其特征在于,还包括:限速模块,用于接收所述客户端发送的第二业务对象的备份任务在上一个周期内的发送速率和接收速率,所述第二业务对象的备份任务正在执行中;根据所述发送速率和所述接收速率获取所述第二业务对象的备份任务在下一个周期内的限速速率;所述发送模块,还用于向所述客户端发送限速指示,所述限速指示包括所述第二业务对象的备份任务在下一个周期内的限速速率。
- 根据权利要求25所述的装置,其特征在于,所述限速模块,具体用于将所述发送 速率和所述接收速率输入第三机器学习模型以得到所述第二业务对象的备份任务在下一个周期内的接收速率;根据所述第二业务对象的备份任务在下一个周期内的接收速率获取所述第二业务对象的备份任务在下一个周期内的限速速率。
- 根据权利要求26所述的装置,其特征在于,所述限速模块,具体用于获取第一端口的预设带宽,所述第一端口用于执行所述第二业务对象的备份任务;获取所述第一端口传输的所有备份任务在下一个周期内的接收速率之和;根据所述第一端口的预设带宽、所述第二业务对象的备份任务在下一个周期内的接收速率以及所述接收速率之和获取所述第二业务对象的备份任务在下一个周期内的限速速率。
- 根据权利要求16-27中任一项所述的装置,其特征在于,还包括:训练模块,用于训练得到目标机器学习模型,所述目标机器学习模型包括第一机器学习模型、第二机器学习模型和第三机器学习模型中的至少之一,所述第一机器学习模型用于预测业务对象的下一次备份任务的调度顺序,所述第二机器学习模型用于预测业务对象的第二备份数据量和第二任务完成时间,所述第三机器学习模型用于预测业务对象的备份任务在下一个周期内的接收速率。
- 根据权利要求28所述的装置,其特征在于,所述训练模块,具体用于获取所述多个业务对象的历史备份数据量和历史任务完成时间,所述历史备份数据量和所述历史任务完成时间和所述多个业务对象的已完成备份任务对应;获取预设的机器学习模型;将所述多个业务对象的历史备份数据量和历史任务完成时间输入所述预设的机器学习模型以得到所述多个业务对象的预测备份数据量和预测任务完成时间;基于所述预测备份数据量和所述预测任务完成时间进行收敛训练以得到所述目标机器学习模型。
- 根据权利要求28或29所述的装置,其特征在于,所述训练模块,具体用于获取所述多个业务对象的已完成备份任务的历史接收速率和历史发送速率;获取预设的机器学习模型;将所述历史接收速率和历史发送速率输入所述预设的机器学习模型以得到所述多个业务对象的备份任务的预测接收速率;基于所述预测接收速率进行收敛训练以得到所述目标机器学习模型。
- 一种备份系统,其特征在于,包括:一个或多个处理器;存储器,用于存储一个或多个程序;当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-15中任一项所述的方法。
- 一种计算机可读存储介质,其特征在于,包括计算机程序,所述计算机程序在计算机上被执行时,使得所述计算机执行权利要求1-15中任一项所述的方法。
- 一种计算机程序产品,其特征在于,所述计算机程序产品包括计算机程序代码,当所述计算机程序代码在计算机上运行时,使得计算机执行权利要求1-15中任一项所述的方法。
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP23856211.0A EP4564173A4 (en) | 2022-08-23 | 2023-06-14 | DATA BACKUP APPARATUS AND METHOD |
| US19/060,320 US20250208953A1 (en) | 2022-08-23 | 2025-02-21 | Data backup method and apparatus |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202211010615.3A CN117687834A (zh) | 2022-08-23 | 2022-08-23 | 数据备份方法和装置 |
| CN202211010615.3 | 2022-08-23 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US19/060,320 Continuation US20250208953A1 (en) | 2022-08-23 | 2025-02-21 | Data backup method and apparatus |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024041119A1 true WO2024041119A1 (zh) | 2024-02-29 |
Family
ID=90012362
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2023/100112 Ceased WO2024041119A1 (zh) | 2022-08-23 | 2023-06-14 | 数据备份方法和装置 |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US20250208953A1 (zh) |
| EP (1) | EP4564173A4 (zh) |
| CN (1) | CN117687834A (zh) |
| WO (1) | WO2024041119A1 (zh) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119226047A (zh) * | 2024-12-02 | 2024-12-31 | 成都云祺科技有限公司 | 基于人工智能的备份数据预测方法、系统及报告生成方法 |
| WO2025228104A1 (zh) * | 2024-04-28 | 2025-11-06 | 华为技术有限公司 | 一种备份任务分配方法、装置以及设备 |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN105302647A (zh) * | 2015-11-06 | 2016-02-03 | 南京信息工程大学 | 一种MapReduce中备份任务推测执行策略的优化方案 |
| WO2018076889A1 (zh) * | 2016-10-25 | 2018-05-03 | 广东欧珀移动通信有限公司 | 数据备份的方法、装置、系统、存储介质及服务器 |
| CN112685224A (zh) * | 2019-10-17 | 2021-04-20 | 伊姆西Ip控股有限责任公司 | 任务管理的方法、设备和计算机程序产品 |
| CN113076224A (zh) * | 2021-05-07 | 2021-07-06 | 中国工商银行股份有限公司 | 数据备份方法、数据备份系统、电子设备及可读存储介质 |
| CN114860160A (zh) * | 2022-04-15 | 2022-08-05 | 北京科杰科技有限公司 | 一种针对Hadoop数据平台的扩容资源预测方法及系统 |
Family Cites Families (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8566287B2 (en) * | 2010-01-29 | 2013-10-22 | Hewlett-Packard Development Company, L.P. | Method and apparatus for scheduling data backups |
| US8924667B2 (en) * | 2011-10-03 | 2014-12-30 | Hewlett-Packard Development Company, L.P. | Backup storage management |
| US8914663B2 (en) * | 2012-03-28 | 2014-12-16 | Hewlett-Packard Development Company, L.P. | Rescheduling failed backup jobs |
| US11061780B1 (en) * | 2019-10-08 | 2021-07-13 | EMC IP Holding Company LLC | Applying machine-learning to optimize the operational efficiency of data backup systems |
| CN112685170B (zh) * | 2019-10-18 | 2023-12-08 | 伊姆西Ip控股有限责任公司 | 备份策略的动态优化 |
| CN112988497B (zh) * | 2019-12-13 | 2024-05-31 | 伊姆西Ip控股有限责任公司 | 管理备份系统的方法、电子设备和计算机程序产品 |
| US11604676B2 (en) * | 2020-06-23 | 2023-03-14 | EMC IP Holding Company LLC | Predictive scheduled backup system and method |
-
2022
- 2022-08-23 CN CN202211010615.3A patent/CN117687834A/zh active Pending
-
2023
- 2023-06-14 EP EP23856211.0A patent/EP4564173A4/en active Pending
- 2023-06-14 WO PCT/CN2023/100112 patent/WO2024041119A1/zh not_active Ceased
-
2025
- 2025-02-21 US US19/060,320 patent/US20250208953A1/en active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN105302647A (zh) * | 2015-11-06 | 2016-02-03 | 南京信息工程大学 | 一种MapReduce中备份任务推测执行策略的优化方案 |
| WO2018076889A1 (zh) * | 2016-10-25 | 2018-05-03 | 广东欧珀移动通信有限公司 | 数据备份的方法、装置、系统、存储介质及服务器 |
| CN112685224A (zh) * | 2019-10-17 | 2021-04-20 | 伊姆西Ip控股有限责任公司 | 任务管理的方法、设备和计算机程序产品 |
| CN113076224A (zh) * | 2021-05-07 | 2021-07-06 | 中国工商银行股份有限公司 | 数据备份方法、数据备份系统、电子设备及可读存储介质 |
| CN114860160A (zh) * | 2022-04-15 | 2022-08-05 | 北京科杰科技有限公司 | 一种针对Hadoop数据平台的扩容资源预测方法及系统 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4564173A4 |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2025228104A1 (zh) * | 2024-04-28 | 2025-11-06 | 华为技术有限公司 | 一种备份任务分配方法、装置以及设备 |
| CN119226047A (zh) * | 2024-12-02 | 2024-12-31 | 成都云祺科技有限公司 | 基于人工智能的备份数据预测方法、系统及报告生成方法 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN117687834A (zh) | 2024-03-12 |
| EP4564173A1 (en) | 2025-06-04 |
| US20250208953A1 (en) | 2025-06-26 |
| EP4564173A4 (en) | 2025-11-19 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Liu et al. | Hastening stream offloading of inference via multi-exit DNNs in mobile edge computing | |
| WO2022228204A1 (zh) | 一种联邦学习方法以及装置 | |
| CN116743635B (zh) | 一种网络预测与调控方法及网络调控系统 | |
| US20250208953A1 (en) | Data backup method and apparatus | |
| US11693392B2 (en) | System for manufacturing dispatching using deep reinforcement and transfer learning | |
| CN114511042A (zh) | 一种模型的训练方法、装置、存储介质及电子装置 | |
| CN112532530A (zh) | 一种拥塞通知信息调整的方法及设备 | |
| WO2023185825A1 (zh) | 调度方法、第一计算节点、第二计算节点以及调度系统 | |
| CN120321304B (zh) | 结合云边协同的资源调度优化方法及系统 | |
| CN115829263A (zh) | 作业调度方法、装置、设备和存储介质 | |
| CN119088547A (zh) | 端边云协同智能系统中的自适应资源优化与模型泛化方法 | |
| CN115690544B (zh) | 多任务学习方法及装置、电子设备和介质 | |
| CN116506310A (zh) | 一种基于自动机器学习的路由器流量识别系统及方法 | |
| CN116723354A (zh) | 基于多智能体强化学习的分布式边缘协同视频分析方法 | |
| CN121034080A (zh) | 一种基于OpenHarmony的高速公路边缘计算流量调控方法 | |
| CN119363679A (zh) | 异构计算资源自适应配置与分配方法和装置、存储介质 | |
| CN112738225B (zh) | 基于人工智能的边缘计算方法 | |
| CN116436980A (zh) | 一种实时视频任务端网边协同调度方法及装置 | |
| CN121157053B (zh) | 三通一脑机器人协同控制架构系统及控制方法 | |
| Lei et al. | Design of a cloud robotics visual platform | |
| Kafle et al. | Automation of computational resource control of cyber-physical systems with machine learning | |
| WO2025079795A1 (en) | Method and apparatus for federated learning | |
| US12401597B1 (en) | Systems and methods for communication between remote environments | |
| US20250068968A1 (en) | Dynamic embedding-based machine learning training mechanism for efficient and agile integration of new information | |
| Wang | AutoHPCNet: A Deep Reinforcement Learning Framework for Adaptive Resource Scheduling in High-Performance Computing |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23856211 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2023856211 Country of ref document: EP |
|
| ENP | Entry into the national phase |
Ref document number: 2023856211 Country of ref document: EP Effective date: 20250225 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| WWP | Wipo information: published in national office |
Ref document number: 2023856211 Country of ref document: EP |