WO2024041119A1 - 数据备份方法和装置 - Google Patents

数据备份方法和装置 Download PDF

Info

Publication number
WO2024041119A1
WO2024041119A1 PCT/CN2023/100112 CN2023100112W WO2024041119A1 WO 2024041119 A1 WO2024041119 A1 WO 2024041119A1 CN 2023100112 W CN2023100112 W CN 2023100112W WO 2024041119 A1 WO2024041119 A1 WO 2024041119A1
Authority
WO
WIPO (PCT)
Prior art keywords
backup
task
business object
backup task
business objects
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2023/100112
Other languages
English (en)
French (fr)
Inventor
程祥乐
石翔
何光军
刘均
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to EP23856211.0A priority Critical patent/EP4564173A4/en
Publication of WO2024041119A1 publication Critical patent/WO2024041119A1/zh
Priority to US19/060,320 priority patent/US20250208953A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operations
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1461Backup scheduling policy
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operations
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/80Database-specific techniques

Definitions

  • the present application relates to data backup technology, and in particular, to a data backup method and device.
  • the client regularly generates copies of the data of different business objects and transmits these copies to the background backup system within a predetermined time window, which is a backup task.
  • a backup task For many backup tasks to be transmitted, the goal of backup orchestration is to design a set of efficient backup transmission resource (transmission time, bandwidth, etc.) allocation and scheduling scheme, and allocate limited backup resources to many backup tasks to be transmitted, so that as much as possible Multiple backup tasks can be completed within a predetermined time window, maximizing the backup bandwidth/throughput of the entire system.
  • a backup orchestration solution based on dynamic scheduling is adopted, which is achieved by dynamically setting task priority and/or task start time.
  • the task priority is set by the customer, and the task start time is dynamically calculated based on the real-time status of the system.
  • the scheduling effect of this solution is greatly affected by the customer's manual experience and random operations, and may not match the dynamic characteristics of the backup system and each backup task accurately enough.
  • preemptive interrupt operations are needed to adapt to dynamic fluctuations after the task is started.
  • preemptive interrupt operations will cause additional interruption and recovery overhead, affecting the system throughput and the service level agreement (Service) of the task.
  • Level Agreement, SLA Service level Agreement
  • This application provides a data backup method and device to fully consider the matching accuracy of the dynamic characteristics of the backup system and each backup task, avoid manual scheduling intervention on multiple backup tasks, and improve the SLA time window compliance of tasks.
  • this application provides a data backup method, which includes: obtaining a first backup data amount and a first task completion time of multiple business objects, and the first backup data amount and the first task completion time are consistent with the first backup data amount and the first task completion time.
  • obtaining the scheduling order of the next backup task of the plurality of business objects according to the first backup data amount and the first task completion time; according to the plurality of business objects The scheduling sequence of the object's next backup task sends a scheduling instruction to the client, where the scheduling instruction includes the identification of the business object corresponding to the backup task to be scheduled.
  • the scheduling sequence of the next backup task of multiple business objects is obtained based on the backup data volume and task completion time prediction of the last backup task of multiple business objects, which can fully consider the backup system and each
  • the matching accuracy of the dynamic characteristics of backup tasks can also avoid manual intervention in scheduling multiple backup tasks and improve the SLA time window compliance of tasks.
  • the multiple business objects of the client can include business objects of servers that support various operating systems (for example, Windows, Unix, Linux, VMware, etc.), and can also include various types of databases (for example, Oracle, SQL, DB2, etc.)
  • the business objects may also include business objects of other devices or services with data backup requirements, which are not specifically limited in the embodiments of this application.
  • a client can include multiple business objects. For example, multiple VMs are created on the server, and one VM is a business object.
  • a database service is created for multiple users, and one database service is That is a business object.
  • the first backup data amount and the first task completion time correspond to the last backup task of the multiple business objects.
  • the data backup requirement of a business object is a long-term and recurring process. For example, a business object will regularly transfer production data from the client at a certain backup frequency (for example, hourly, daily, weekly, monthly, etc.) The data is backed up to the background backup system through the switching network, so the business object can regard each data transmission process as a backup task. It can be seen that the backup task of the business object is started and executed periodically. Based on this, in the embodiment of this application, the backup task to be executed in the next cycle is called the next backup task of the business object, and the previous backup task of the backup task is called the next backup task of the business object. The backup task that has been executed in a cycle is called the last backup task of the business object.
  • the client can report the backup data volume and task completion time (JCT) of the backup task to Back up your system.
  • JCT backup data volume and task completion time
  • the backup system can use the following two methods to obtain the scheduling order of the next backup tasks of multiple business objects:
  • the second backup data volume and the second task completion time are The time corresponds to the next backup task of multiple business objects; the scheduling order of the next backup task of multiple business objects is obtained based on the second backup data volume and the second task completion time.
  • the backup system can calculate the data corresponding to the first business object.
  • the ratio of the second backup data amount to the second task completion time is used to obtain the ease of the next backup task of the first business object. That is, the backup system needs to calculate the ease of the next backup task for each of multiple business objects.
  • the ease is the ratio of the second backup data amount and the second task completion time. The larger the ratio, the easier the next backup task is. , the smaller the ratio, the more difficult the next backup task is.
  • the remaining startup time of the next backup task of the first business object is obtained according to the completion time of the second task corresponding to the first business object.
  • the backup system can calculate the difference between the latest task end time MaxEndTime (this information is reported to the backup system when the business object is registered or reconfigured), the second task completion time and the current time t to obtain the remaining startup time of the next backup task.
  • the backup system obtains the ease of its next backup task and the After the remaining startup time, the ratio of the ease of the next backup task of the first business object to the remaining startup time of the next backup task of multiple business objects can be calculated to obtain the scheduling threshold of the next backup task of the first business object. .
  • the backup system can obtain the scheduling threshold of the next backup task for all business objects. The larger the scheduling threshold, it means that the next backup task is easy to complete and time is tight, and needs to be started as soon as possible. The smaller the scheduling threshold, it means that the next backup task will be completed as soon as possible. If the task is difficult to complete and the time is flexible, the start can be postponed. Then, the threshold values of the next backup tasks of multiple business objects are sorted from large to small to obtain the scheduling order of the next backup tasks of multiple business objects.
  • the above two methods use two machine learning models.
  • the difference is that the output of the machine learning model is different, which is related to the training process of the machine learning model.
  • the machine learning model can also output other information to assist in obtaining the information of multiple business objects.
  • the scheduling order of the next backup task Therefore, the embodiments of this application do not limit the output of the machine learning model.
  • the system throughput and task SLA time window compliance are fully considered during training, combined with the dynamic characteristics of the backup task (for example, the backup data volume of the backup task, task completion time, and the earliest start time of the task). , the latest task end time, etc.), the matching accuracy of the dynamic characteristics of the backup system and each backup task is trained, so that the prediction results of the machine learning model are closer to the actual execution results.
  • methods other than the above-mentioned methods may also be used to obtain the scheduling order of the next backup tasks of multiple business objects, and there is no specific limitation on this.
  • the backup system can first determine whether the next backup task of the first business object is cancelled; when the next backup task of the first business object is not cancelled, the backup system can determine whether the next backup task of the first business object is cancelled. Obtain the scheduling order of the next backup task for multiple business objects based on the ease of use and the remaining start time of the next backup task for multiple business objects.
  • the backup system can determine whether the remaining startup time of the next backup task of the first business object is less than 0. When the remaining start time of the next backup task of the first business object is less than 0 (indicating that the latest end time of the next backup task of the first business object has passed), the cancellation of the next backup task of the first business object is calculated.
  • Probability for example, calculate 1-p, p is a preset probability value, the larger 1-p is, the greater the probability of cancellation of the next backup task of the first business object, and the more likely it is to be canceled, the smaller 1-p means that the smaller the cancellation probability of the next backup task of the first business object, the more likely it is that it will not be canceled); when the cancellation probability of the next backup task of the first business object is greater than the preset threshold, it is determined to cancel the first business object.
  • the next backup task when the cancellation probability of the next backup task of the first business object is less than or equal to the preset threshold, it is determined not to cancel the next backup task of the first business object.
  • next backup task of the business object that needs to be canceled can be scheduled to avoid unnecessary scheduling caused by scheduling the next backup task of the business object that needs to be canceled, thus improving the next backup task of multiple business objects. scheduling efficiency.
  • the backup system After the backup system obtains the scheduling order of the next backup task for multiple business objects, it can take the backup task at the head of the queue from the scheduling queue according to the number of parallel threads available in real time and the scheduling order as the backup task to be scheduled, and send it to the backup task.
  • the client sends a scheduling instruction, which includes the identification of the backup task to be scheduled. For example, if the number of parallel threads available in real time is 2, then there can be 2 backup tasks that can currently be scheduled. Therefore, the backup system takes 2 backup tasks from the head of the scheduling queue (for example, Job IDs are 1 and 2) and sends them to the customer.
  • the client reports multiple data to the backup system again.
  • the amount of backup data and task completion time for this backup task of the business object can choose to send the backup data amount and task completion time of each business object backup task.
  • the client can also choose to send the backup data of multiple business objects at once after completing the backup task of multiple business objects. There are no specific restrictions on the backup data volume and task completion time of the backup task.
  • the method further includes: receiving the backup of the second business object sent by the client.
  • the rate limit rate of the backup tasks of multiple business objects in the next cycle is predicted based on the sending rate and receiving rate of the backup tasks of multiple business objects in the previous cycle, which can make the port Bandwidth resources can be optimally used based on the real-time transmission characteristics of tasks, improving system throughput. It can also avoid manual intervention in scheduling multiple backup tasks and improve system transmission efficiency.
  • the backup task of the second business object is being executed, that is, in the embodiment of the present application, the speed limit can be implemented for the backup task that is being executed (the client is sending the backup data of the backup task to the backup system), so that the port bandwidth Resources can be optimally used based on the real-time transmission characteristics of tasks, improving system throughput.
  • the backup task of business objects is executed periodically.
  • the backup task being executed can be considered to be in the current cycle (can also be called the next cycle), then the cycle in which the backup task was last executed can be considered Is the previous cycle of this backup task.
  • a rate reporting cycle can be set, and the client reports the second service in the previous reporting cycle to the backup system according to this reporting cycle.
  • the sending rate and receiving rate are regarded as the average sending rate and average receiving rate in the previous reporting period, or they can also be regarded as the weighted value in the previous reporting period.
  • the average sending rate and the weighted average receiving rate are not specifically limited.
  • the backup system can input the sending rate and the receiving rate into the third machine learning model to obtain the receiving rate of the backup task of the second business object in the next cycle; and then obtain the receiving rate of the backup task of the second business object in the next cycle. 2. The rate limit of the backup task of the business object in the next cycle.
  • the backup system can obtain the preset bandwidth of the first port, which is used to perform the backup task of the second business object; obtain the sum of the reception rates of all backup tasks transmitted by the first port in the next cycle; according to the first port
  • the rate limit rate of the backup task of the second business object in the next cycle is obtained by the sum of the preset bandwidth, the reception rate of the backup task of the second business object in the next cycle, and the reception rate.
  • the predicted reception rate of backup task i is the reception rate of the backup task of the second business object in the next cycle
  • the sum of the predicted reception rates of all backup tasks transmitted on the first port is the sum of the predicted reception rates of all backup tasks transmitted on the first port. The sum of the receive rates in the next cycle.
  • the embodiment of the present application can also use other methods to calculate the rate limit rate of the backup task of the second business object in the next cycle, which is not specifically limited.
  • the backup system sends the rate limit rate of the backup task of the second business object in the next cycle to the client through the rate limit indication, so that the client can transmit the backup data of the backup task of the second business object based on the rate limit rate. Sending rate is limited.
  • the client can limit the sending rate of the backup data of the backup task that transmits the second business object based on the rate limit, so that the port bandwidth resources can be optimally used based on the real-time transmission characteristics of the task and improve the throughput of the system.
  • the method further includes: training to obtain a target machine learning model, where the target machine learning model includes at least one of a first machine learning model, a second machine learning model, and a third machine learning model, so The first machine learning model is used to predict the scheduling sequence of the next backup task of the business object, the second machine learning model is used to predict the second backup data volume and the second task completion time of the business object, and the third machine The learning model is used to predict the reception rate of backup tasks of business objects in the next cycle.
  • Training data is data used to train machine learning models.
  • the training data can be different depending on the structure, parameters, functions, etc. of the machine learning model.
  • the target machine learning model includes at least one of a first machine learning model, a second machine learning model, and a third machine learning model, where the first machine learning model is used to predict the next backup of the business object.
  • the scheduling sequence of tasks, the second machine learning model is used to predict the second backup data volume and the second task completion time of the business object, and the third machine learning model is used to predict the reception rate of the backup task of the business object in the next cycle.
  • the backup system can obtain the historical backup data volume and historical task completion time of multiple business objects.
  • the historical backup data volume and historical task completion time correspond to the completed backup tasks of multiple business objects; obtain the preset machine learning Model; input the historical backup data volume and historical task completion time of multiple business objects into the preset machine learning model to obtain the predicted backup data volume and predicted task completion time of multiple business objects; based on the predicted backup data volume and predicted task completion Time for convergence training to obtain the target machine learning model.
  • the backup system can obtain the historical receiving rate and historical sending rate of completed backup tasks of multiple business objects; obtain the preset machine learning model; input the historical receiving rate and historical sending rate into the preset machine learning model to Obtain the predicted reception rate of backup tasks for multiple business objects; perform convergence training based on the predicted reception rate to obtain the target machine learning model.
  • the preset machine learning model may also be different, including the structure and parameters of the preset machine learning model. This is not the case in the embodiments of the present application. Make specific limitations.
  • the machine learning model can also use the error back propagation (BP) algorithm to correct the size of the parameters in the initial model during the training process, so that the reconstruction error loss of the model becomes smaller and smaller. Specifically, forward propagation of the input signal until the output will produce an error loss, and backward propagation of the error loss information is used to update the parameters in the initial model, so that the error loss converges.
  • the backpropagation algorithm is a backpropagation movement dominated by error loss, aiming to obtain the optimal parameters of the machine learning model, such as the weight matrix.
  • this application provides a data backup device, including: an acquisition module, configured to acquire a first backup data amount and a first task completion time of multiple business objects, the first backup data amount and the first task completion time.
  • the task completion time corresponds to the last backup task of the multiple business objects;
  • the scheduling module is used to obtain the next backup task of the multiple business objects according to the first backup data amount and the first task completion time.
  • the sending module is configured to send a scheduling instruction to the client according to the scheduling sequence of the next backup task of the multiple business objects, where the scheduling instruction includes the identification of the business object corresponding to the backup task to be scheduled.
  • the scheduling module is specifically configured to input the first backup data amount and the first task completion time into a first machine learning model to obtain the next time of the multiple business objects.
  • the scheduling order of backup tasks is specifically configured to input the first backup data amount and the first task completion time into a first machine learning model to obtain the next time of the multiple business objects.
  • the scheduling module is specifically configured to input the first backup data amount and the first task completion time into a second machine learning model to obtain a second value of the multiple business objects.
  • the backup data amount and the second task completion time, the second backup data amount and the second task completion time correspond to the next backup task of the multiple business objects; according to the second backup data amount and the second task completion time
  • the second task completion time obtains the scheduling order of the next backup tasks of the multiple business objects.
  • the scheduling module is specifically configured to calculate the ratio of the second backup data amount corresponding to the first business object and the second task completion time to obtain the first business object's ratio.
  • the ease of the next backup task the first business object is any one of the plurality of business objects; the first business object is obtained according to the second task completion time corresponding to the first business object.
  • the remaining startup time of the next backup task when the ease of the next backup task of the multiple business objects and the remaining startup time of the next backup task of the multiple business objects are both obtained, according to the multiple business objects
  • the ease of the next backup task of the object and the remaining start time of the next backup task of the multiple business objects are used to obtain the scheduling order of the next backup tasks of the multiple business objects.
  • the scheduling module is also used to determine whether the next backup task of the first business object is canceled; when the next backup task of the first business object is not canceled, based on the The scheduling order of the next backup task of the multiple business objects is obtained based on the ease of the next backup task of the multiple business objects and the remaining startup time of the next backup task of the multiple business objects.
  • the scheduling module is specifically configured to calculate the ratio of the ease of the next backup task of the multiple business objects and the remaining startup time of the next backup task of the multiple business objects.
  • the scheduling module is specifically configured to determine whether the remaining startup time of the next backup task of the first business object is less than 0; when the next backup task of the first business object When the remaining startup time is less than 0, calculate the cancellation probability of the next backup task of the first business object; when the cancellation probability of the next backup task of the first business object is greater than the preset threshold, determine to cancel the first backup task.
  • the next backup of the business object tasks are specifically configured to determine whether the remaining startup time of the next backup task of the first business object is less than 0; when the next backup task of the first business object When the remaining startup time is less than 0, calculate the cancellation probability of the next backup task of the first business object; when the cancellation probability of the next backup task of the first business object is greater than the preset threshold, determine to cancel the first backup task. The next backup of the business object tasks.
  • the scheduling module is further configured to determine not to cancel the next backup task of the first business object when the remaining startup time of the next backup task of the first business object is greater than or equal to 0. A backup task.
  • the scheduling module is further configured to determine not to cancel the first business object when the cancellation probability of the next backup task of the first business object is less than or equal to a preset threshold. Next backup task.
  • the method further includes: a rate limiting module configured to receive the sending rate and receiving rate of the backup task of the second business object sent by the client in the previous cycle, and the second business object The backup task is being executed; obtain the rate limit rate of the backup task of the second business object in the next cycle according to the sending rate and the receiving rate; the sending module is also used to send to the client A rate limit indication, which includes a rate limit rate of the backup task of the second business object in the next cycle.
  • a rate limiting module configured to receive the sending rate and receiving rate of the backup task of the second business object sent by the client in the previous cycle, and the second business object The backup task is being executed; obtain the rate limit rate of the backup task of the second business object in the next cycle according to the sending rate and the receiving rate; the sending module is also used to send to the client A rate limit indication, which includes a rate limit rate of the backup task of the second business object in the next cycle.
  • the rate limiting module is specifically configured to input the sending rate and the receiving rate into a third machine learning model to obtain the backup task of the second business object in the next cycle.
  • Reception rate obtain the rate limit rate of the backup task of the second business object in the next cycle according to the reception rate of the backup task of the second business object in the next cycle.
  • the rate limiting module is specifically used to obtain the preset bandwidth of a first port, and the first port is used to perform the backup task of the second business object; obtain the first The sum of the reception rates of all backup tasks transmitted by the port in the next cycle; the preset bandwidth of the first port, the reception rate of the backup tasks of the second business object in the next cycle, and the sum of the reception rates Obtain the rate limit rate of the backup task of the second business object in the next cycle.
  • the method further includes: a training module for training to obtain a target machine learning model, where the target machine learning model includes a first machine learning model, a second machine learning model, and a third machine learning model. At least one of, the first machine learning model is used to predict the scheduling order of the next backup task of the business object, and the second machine learning model is used to predict the second backup data amount and the second task completion time of the business object, The third machine learning model is used to predict the reception rate of the backup task of the business object in the next cycle.
  • the training module is specifically configured to obtain the historical backup data volume and historical task completion time of the multiple business objects, the historical backup data volume and the historical task completion time, and the historical task completion time.
  • the training module is specifically configured to obtain the historical backup data volume and historical task completion time of the multiple business objects, the historical backup data volume and the historical task completion time, and the historical task completion time.
  • the training module is specifically configured to obtain the historical backup data volume and historical task completion time of the multiple business objects, the historical backup data volume and the historical task completion time, and the historical task completion time.
  • the training module is specifically configured to obtain the historical backup data volume and historical task completion time of the multiple business objects, the historical backup data volume and the historical task completion time, and the historical task completion time.
  • the training module is specifically configured to obtain the historical receiving rate and historical sending rate of completed backup tasks of the multiple business objects; obtain a preset machine learning model; and convert the historical The reception rate and historical sending rate are input into the preset machine learning model to obtain the predicted reception rate of the backup tasks of the multiple business objects; convergence training is performed based on the predicted reception rate to obtain the target machine learning model.
  • this application provides a backup system, including: one or more processors; a memory for storing one or more programs; when the one or more programs are executed by the one or more processors , causing the one or more processors to implement the method described in any one of the above first aspects.
  • the present application provides a computer-readable storage medium, including a computer program.
  • the computer program When the computer program is executed on a computer, it causes the computer to perform the method described in any one of the above-mentioned first aspects.
  • the present application provides a computer program product.
  • the computer program product includes computer program code.
  • the computer program code When the computer program code is run on a computer, it causes the computer to execute the method described in any one of the above first aspects. .
  • Figure 1 is an exemplary framework diagram of a backup system
  • Figure 2 is an exemplary structural diagram of the system architecture of the present application
  • Figure 3 is a flow chart of the process 300 of the data backup method according to the embodiment of the present application.
  • Figure 4 is a flow chart of the process 400 of the data backup method according to the embodiment of the present application.
  • Figure 5 is a flow chart of the process 500 of the data backup method according to the embodiment of the present application.
  • Figure 6 is an exemplary schematic diagram of the client configuration and registration process
  • Figure 7 is an exemplary schematic diagram of the scheduling and ranking calculation process
  • Figure 8 is an exemplary schematic diagram of a real-time scheduling interaction process
  • Figure 9 is an exemplary schematic diagram of the speed limit rate calculation process
  • Figure 10 is an exemplary schematic diagram of the training process of the machine learning model
  • Figure 11 is an exemplary structural diagram of a machine learning model
  • Figure 12 is an exemplary structural schematic diagram of the data backup device 1200 according to the embodiment of the present application.
  • At least one (item) refers to one or more, and “plurality” refers to two or more.
  • “And/or” is used to describe the relationship between associated objects, indicating that there can be three relationships. For example, “A and/or B” can mean: only A exists, only B exists, and A and B exist simultaneously. , where A and B can be singular or plural. The character “/” generally indicates that the related objects are in an "or” relationship. “At least one of the following” or similar expressions thereof refers to any combination of these items, including any combination of a single item (items) or a plurality of items (items).
  • At least one of a, b or c can mean: a, b, c, "a and b", “a and c", “b and c", or "a and b and c” ”, where a, b, c can be single or multiple.
  • the client regularly generates copies of the data of different business objects and transmits these copies to the background backup system within a predetermined time window, which is a backup task.
  • the goal of backup orchestration is to design a set of efficient backup transmission resource (transmission time, bandwidth, etc.) allocation and scheduling scheme, and allocate limited backup resources to many backup tasks to be transmitted, so that as much as possible Multiple backup tasks can be completed within the scheduled time window and maximize Backup bandwidth/throughput of the entire system.
  • the backup data of actual production environments in various industries is complex and changeable.
  • the arrival times and read and write rates of different backup business object streams generated by multiple clients/business objects are highly dynamic, and the backup time windows allowed by tasks are also different.
  • effective backup orchestration cannot be performed for many backup tasks, the business object flows of different tasks will cause resource conflicts or unnecessary resource idleness in the same time and space, resulting in low effective bandwidth and damage to the Service Level Agreement (Service Level Agreement, SLA). .
  • SLA Service Level Agreement
  • Figure 1 is an exemplary framework diagram of a backup system.
  • the client for example, client 1, client N
  • VM virtual machine
  • client N deploys a series of VMs (for example, VM1 ⁇ VM N) to host different Application business objects.
  • VM business objects will regularly back up production data from the client to the background backup system through the switching network at a certain backup frequency (for example, hourly, daily, weekly, monthly, etc.).
  • Each VM business object is divided into full backup (backing up all data) and incremental backup (backing up only new data compared to the last backup).
  • the amount of backup data to be transferred in each backup task is highly dynamic and unknown in advance.
  • Each backup task must be backed up within a specified time window (for example, only after the job stops and before the next job starts). Due to differences in the backup data storage media and production environment of each backup task, the client's ability to read backup data, dynamic transmission over the network, and the backup system's disk download processing speed are all highly dynamic.
  • the backup system needs to perform efficient scheduling and orchestration of numerous backup tasks under various time windows and resource constraints, so that as many backup tasks as possible can be completed within the specified time window and maximize system backup throughput.
  • the backup task orchestration challenge of the backup system mainly lies in how to make the best intelligent supply and demand orchestration decisions under the influence of many dynamic and uncertain factors.
  • Dynamic uncertain factors include: 1) The total data volume of the backup task is unknown in advance; 2) The rate at which the client reads backup data from the local storage medium fluctuates dynamically (affected by the local read rate, Central Processing Unit, CPU) load, etc.); 3) The real-time transmission rate of backup data is dynamically affected by link bandwidth congestion, back-end reception and input and output (Input Output, IO) disk placement; 4) Backup tasks are constantly and dynamically completed, exited and new The entry of backup tasks causes the load of the backup system and the real-time available remaining resources of the entire backup system to fluctuate dynamically.
  • Static orchestration mainly controls the execution of backup tasks by setting fixed task backup policies.
  • Dynamic orchestration is the dynamic adaptive adjustment of scheduling based on the dynamic attributes of business objects.
  • the scheduling effect of these orchestration strategies is largely affected by the customer's manual experience and random operations. It may not match the dynamic characteristics of the backup system and each backup task accurately enough, and may even cause additional interruption and recovery overhead, affecting the system's performance.
  • this application provides a data backup method that can maximize the throughput of the entire backup system and schedule as many backup tasks as possible to complete the transmission of backup data within their respective SLA time windows. lose.
  • Neural network is a machine learning model.
  • the neural network can be composed of neural units.
  • the neural unit can refer to an arithmetic unit that takes xs and intercept 1 as input.
  • the output of the arithmetic unit can be:
  • s 1, 2,...n, n is a natural number greater than 1
  • Ws is the weight of xs
  • b is the bias of the neural unit.
  • f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of this activation function can be used as the input of the next convolutional layer.
  • the activation function can be a nonlinear function such as ReLU.
  • a neural network is a network formed by connecting many of the above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected to the local receptive field of the previous layer to extract the characteristics of the local receptive field.
  • the local receptive field can be an area composed of several neural units.
  • Multi-layer perception (MLP)
  • MLP is a simple deep neural network (DNN) (different layers are fully connected), also called a multi-layer neural network, which can be understood as a neural network with many hidden layers.
  • DNN deep neural network
  • the neural network inside DNN can be divided into three categories: input layer, hidden layer, and output layer.
  • the first layer is the input layer
  • the last layer is the output layer
  • the layers in between are hidden layers.
  • the layers are fully connected, that is to say, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.
  • DNN looks very complicated, the work of each layer is actually not complicated.
  • the coefficient from the k-th neuron in layer L-1 to the j-th neuron in layer L is defined as It should be noted that the input layer has no W parameter.
  • more hidden layers make the network more capable of describing complex situations in the real world. Theoretically, a model with more parameters has higher complexity and greater "capacity", which means it can complete more complex learning tasks.
  • Training a deep neural network is the process of learning the weight matrix. The ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (a weight matrix formed by the vectors W of many layers).
  • CNN Convolutional neural network
  • the convolutional neural network contains a feature extractor composed of convolutional layers and pooling layers.
  • the feature extractor can be regarded as a filter, and the convolution process can be regarded as using a trainable filter to convolve with an input image or convolution feature plane (feature map).
  • the convolutional layer refers to the neuron layer in the convolutional neural network that convolves the input signal.
  • the convolution layer can include many convolution operators.
  • the convolution operator is also called a kernel. Its role in image processing is equivalent to a filter that extracts specific information from the input image matrix.
  • the convolution operator can essentially Is a weight matrix, which is usually predefined. During the convolution operation on the image, the weight matrix is usually pixel by pixel (or two pixels by two pixels) along the horizontal direction on the input image... ...This depends on the value of the step size) to complete the task of extracting specific features from the image.
  • the size of the weight matrix should be related to the size of the image.
  • the depth dimension of the weight matrix is the same as the depth dimension of the input image.
  • the weight matrix will extend to Enter the entire depth of the image. Therefore, convolution with a single weight matrix will produce a convolved output with a single depth dimension, but in most cases, instead of using a single weight matrix, multiple weight matrices of the same size (rows ⁇ columns) are applied, That is, multiple matrices of the same type.
  • the output of each weight matrix is stacked to form the depth dimension of the convolution image.
  • the dimension here can be understood as being determined by the "multiple" mentioned above. Different weight matrices can be used to extract different features in the image.
  • one weight matrix is used to extract edge information of the image
  • another weight matrix is used to extract specific colors of the image
  • another weight matrix is used to remove unnecessary noise in the image. Perform blurring, etc.
  • the multiple weight matrices have the same size (row ⁇ column), and the feature maps extracted by the multiple weight matrices with the same size are also the same size.
  • the extracted multiple feature maps with the same size are then merged to form a convolution operation. output.
  • the weight values in these weight matrices require a lot of training in practical applications.
  • Each weight matrix formed by the weight values obtained through training can be used to extract information from the input image, so that the convolutional neural network can make correct predictions.
  • the initial convolutional layer often extracts more general features, which can also be called low-level features; as the depth of the convolutional neural network deepens,
  • the features extracted by subsequent convolutional layers become more and more complex, such as high-level semantic features.
  • Features with higher semantics are more suitable for the problem to be solved.
  • the pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling the input image to obtain a smaller size image.
  • the average pooling operator can calculate the pixel values in the image within a specific range to generate an average value as the result of average pooling.
  • the max pooling operator can take the pixel with the largest value in a specific range as the result of max pooling.
  • the operators in the pooling layer should also be related to the size of the image.
  • the size of the image output after processing by the pooling layer can be smaller than the size of the image input to the pooling layer.
  • Each pixel in the image output by the pooling layer represents the average or maximum value of the corresponding sub-region of the image input to the pooling layer.
  • the convolutional neural network After being processed by the convolutional layer/pooling layer, the convolutional neural network is not enough to output the required output information. Because As mentioned before, the convolutional layer/pooling layer only extracts features and reduces the parameters brought by the input image. However, in order to generate the final output information (required class information or other related information), the convolutional neural network needs to use neural network layers to generate an output or a set of required number of classes. Therefore, the neural network layer can include multiple hidden layers, and the parameters contained in the multiple hidden layers can be pre-trained based on relevant training data of a specific task type. For example, the task type can include image recognition, Image classification, image super-resolution reconstruction, etc.
  • the output layer of the entire convolutional neural network is also included.
  • This output layer has a loss function similar to categorical cross-entropy, specifically used to calculate the prediction error.
  • Recurrent neural networks are used to process sequence data.
  • the layers are fully connected, while the nodes within each layer are unconnected.
  • this ordinary neural network has solved many difficult problems, it is still incompetent for many problems. For example, if you want to predict the next word of a sentence, you generally need to use the previous word, because the preceding and following words in a sentence are not independent. The reason why RNN is called a recurrent neural network is that the current output of a sequence is also related to the previous output.
  • RNN can process sequence data of any length.
  • the training of RNN is the same as the training of traditional CNN or DNN.
  • the error backpropagation algorithm is also used, but there is one difference: that is, if the RNN is expanded into a network, then the parameters, such as W, are shared; this is not the case with the traditional neural network as shown in the example above.
  • the output of each step not only depends on the network of the current step, but also depends on the status of the network of several previous steps. This learning algorithm is called Back propagation Through Time (BPTT).
  • BPTT Back propagation Through Time
  • the convolutional neural network can use the error back propagation (BP) algorithm to modify the size of the parameters in the initial super-resolution model during the training process, so that the reconstruction error loss of the super-resolution model becomes smaller and smaller.
  • BP error back propagation
  • forward propagation of the input signal until the output will produce an error loss
  • the parameters in the initial super-resolution model are updated by back-propagating the error loss information, so that the error loss converges.
  • the backpropagation algorithm is a backpropagation movement dominated by error loss, aiming to obtain the optimal parameters of the super-resolution model, such as the weight matrix.
  • Generative adversarial networks is a deep learning model.
  • the model includes at least two modules: one module is a generative model (Generative Model), and the other module is a discriminative model (Discriminative Model). Through these two modules, they learn from each other to produce better output.
  • Both the generative model and the discriminative model can be neural networks, specifically deep neural networks or convolutional neural networks.
  • the basic principle of GAN is as follows: Take the GAN that generates pictures as an example. Suppose there are two networks, G (Generator) and D (Discriminator), where G is a network that generates pictures.
  • D is a discriminant network, used to judge whether a picture is "real". Its input parameter is x, x represents a picture, and the output D(x) represents the probability that x is a real picture. If it is 1, it means 130% is a real picture. If it is 0, it means it cannot be real. picture.
  • the goal of the generative network G is to generate real pictures as much as possible to deceive the discriminant network D, and the goal of the discriminant network D is to try to distinguish the pictures generated by G from the real pictures. Come.
  • G and D constitute a dynamic "game” process, that is, the "confrontation” in the "generative adversarial network".
  • Figure 2 is an exemplary structural diagram of the system architecture of this application. As shown in Figure 2, the system can be applied to the scheduling and orchestration of data backup tasks in various scenarios, including VM, database, file and other backup scenarios. There is no specific limit on this.
  • the system is divided into two parts: the front-end (client) and the back-end (backup system).
  • the front-end client and the back-end backup system are connected through a network.
  • the network connection includes but is not limited to Ethernet, Internet Protocol (IP) ) and other direct or multi-hop network connections, as well as data transmission forms such as Transmission Control Protocol (TCP) and Remote Directed Memory Access (RDMA).
  • IP Internet Protocol
  • TCP Transmission Control Protocol
  • RDMA Remote Directed Memory Access
  • the front-end client can include servers that support various operating systems (such as Windows, Unix, Linux, VMware, etc.) or various databases (such as Oracle, SQL, DB2, etc.).
  • the client can also include This application does not specifically limit other business objects with data backup requirements.
  • the back-end backup system can perform unified scheduling and orchestration for many backup tasks under different SLA constraints. Under the constraints of existing CPU and network transmission resources, the backup tasks to be transmitted can be scheduled and sorted and the transmission rate can be controlled to optimize system throughput. and the SLA time window compliance of each backup task.
  • Figure 2 exemplarily shows the system architecture of the present application, but this system architecture does not constitute a limitation.
  • the number, implementation form, implementation functions, etc. of the front-end clients in the system, as well as the construction method and deployment strategy of the back-end backup system, etc., can all be implemented using other solutions, and there are no specific restrictions on this.
  • FIG. 3 is a flow chart of a process 300 of the data backup method according to the embodiment of the present application.
  • Process 300 may be performed by both the front-end client and the back-end backup system.
  • Process 300 is described as a series of steps or operations, and it should be understood that process 300 may be performed in various orders and/or occur simultaneously, and is not limited to the order of execution shown in FIG. 3 .
  • Process 300 includes the following steps:
  • Step 301 The client reports the first backup data volume and first task completion time of multiple business objects to the backup system.
  • the multiple business objects of the client can include business objects of servers that support various operating systems (such as Windows, Unix, Linux, VMware, etc.), and can also include business objects of various databases (such as Oracle, SQL, DB2, etc.) , and may also include business objects of other devices or services with data backup requirements, which are not specifically limited in the embodiments of this application.
  • a client can include multiple business objects. For example, multiple VMs are created on the server, and one VM is a business object.
  • a database service is created for multiple users, and one database service is That is a business object.
  • the first backup data amount and the first task completion time correspond to the last backup task of the multiple business objects.
  • the data backup requirement of a business object is a long-term and recurring process. For example, a business object will regularly transfer production data from the client at a certain backup frequency (for example, hourly, daily, weekly, monthly, etc.) The data is backed up to the background backup system through the switching network, so the business object can regard each data transmission process as a backup task. It can be seen that the backup task of the business object is started and executed periodically. Based on this, in the embodiment of this application, the backup task to be executed in the next cycle is called the next backup task of the business object, and the previous backup task of the backup task is called the next backup task of the business object. The backup task that has been executed in a cycle is called the last backup task of the business object.
  • the client can report the backup data volume and task completion time (JCT) of the backup task to Back up your system.
  • JCT backup data volume and task completion time
  • Step 302 The backup system obtains the scheduling order of the next backup tasks of multiple business objects based on the first backup data amount and the first task completion time.
  • the backup system can use the following two methods to obtain the scheduling order of the next backup tasks of multiple business objects:
  • the first machine learning model can be pre-trained, and the training process can be referred to the following embodiments.
  • the input of the first machine learning model is the first backup data amount and the first task completion time of multiple business objects
  • the output is the scheduling order of the next backup task for each of the multiple business objects.
  • the business objects include VM1, VM2 and VM3.
  • the first backup data volume and the first task completion time of the three business objects are input into the first machine learning model. After prediction by the first machine learning model, the three business objects are output.
  • the scheduling sequence of the next backup task is VM3 ⁇ VM2 ⁇ VM1.
  • scheduling order can be expressed in the order of business object identifiers (for example, VM ID), or It can be expressed in the order of the index (for example, Job ID) of the backup tasks of the business object.
  • VM ID business object identifiers
  • Job ID index of the backup tasks of the business object.
  • the embodiment of the present application does not specifically limit the expression method of the scheduling order.
  • the second backup data volume and the second task completion time are The time corresponds to the next backup task of multiple business objects; the scheduling order of the next backup task of multiple business objects is obtained based on the second backup data volume and the second task completion time.
  • the second machine learning model can also be pre-trained, and its training process can also refer to the following embodiments.
  • the input of the second machine learning model is the first backup data volume and the first task completion time of the plurality of business objects
  • the output is the second backup data volume and the second task completion time of the plurality of business objects.
  • the second backup data The amount and the completion time of the second task are the backup data amount and task completion time of the next backup task of the business object predicted by the second machine learning model. Since the next backup task has not yet been executed, the second backup data amount and the second task The completion time is not an actual value, but a predicted value.
  • the backup system can calculate the data corresponding to the first business object.
  • the ratio of the second backup data amount to the second task completion time is used to obtain the ease of the next backup task of the first business object. That is, the backup system needs to calculate the ease of the next backup task for each of the multiple business objects.
  • the ease is the ratio of the second backup data amount and the second task completion time. The larger the ratio, the easier the next backup task is. , the smaller the ratio, the more difficult the next backup task is.
  • the remaining start time of the next backup task of the first business object is obtained according to the completion time of the second task corresponding to the first business object.
  • the backup system can calculate the difference between the latest task end time MaxEndTime (this information is reported to the backup system when the business object is registered or reconfigured), the second task completion time and the current time t to obtain the remaining startup time of the next backup task.
  • the backup system After the backup system obtains the ease of the next backup task and the remaining startup time of the next backup task, it can calculate the ease of the next backup task of the first business object and the download times of multiple business objects. The ratio of the remaining startup time of a backup task is used to obtain the scheduling threshold of the next backup task of the first business object. Using the same calculation method, the backup system can obtain the scheduling threshold of the next backup task for all business objects. The larger the scheduling threshold, it means that the next backup task is easy to complete and time is tight, and needs to be started as soon as possible. The smaller the scheduling threshold, it means that the next backup task will be completed as soon as possible. If the task is difficult to complete and the time is flexible, the start can be postponed. Then, the threshold values of the next backup tasks of multiple business objects are sorted from large to small to obtain the scheduling order of the next backup tasks of multiple business objects.
  • the above two methods use two machine learning models.
  • the difference is that the output of the machine learning model is different, which is related to the training process of the machine learning model.
  • the machine learning model can also output other information to assist in obtaining the information of multiple business objects.
  • the scheduling order of the next backup task Therefore, the embodiments of this application do not limit the output of the machine learning model.
  • the system throughput and task SLA time window compliance are fully considered during training, combined with the dynamic characteristics of the backup task (for example, the backup data volume of the backup task, task completion time, and the earliest start time of the task). , the latest task end time, etc.), the matching accuracy of the dynamic characteristics of the backup system and each backup task is improved. training, so that the prediction results of the machine learning model are closer to the actual execution results.
  • methods other than the above-mentioned methods may also be used to obtain the scheduling order of the next backup tasks of multiple business objects, and there is no specific limitation on this.
  • the backup system can first determine whether the next backup task of the first business object is cancelled; when the next backup task of the first business object is not cancelled, the backup system can determine whether the next backup task of the first business object is cancelled. Obtain the scheduling order of the next backup task for multiple business objects based on the ease of use and the remaining start time of the next backup task for multiple business objects.
  • the backup system can determine whether the remaining startup time of the next backup task of the first business object is less than 0. When the remaining start time of the next backup task of the first business object is less than 0 (indicating that the latest end time of the next backup task of the first business object has passed), the cancellation of the next backup task of the first business object is calculated.
  • Probability for example, calculate 1-p, p is a preset probability value, the larger 1-p is, the greater the probability of cancellation of the next backup task of the first business object, and the more likely it is to be canceled, the smaller 1-p means that the smaller the cancellation probability of the next backup task of the first business object, the more likely it is that it will not be canceled); when the cancellation probability of the next backup task of the first business object is greater than the preset threshold, it is determined to cancel the first business object.
  • the next backup task when the cancellation probability of the next backup task of the first business object is less than or equal to the preset threshold, it is determined not to cancel the next backup task of the first business object.
  • next backup task of the business object that needs to be canceled can be scheduled to avoid unnecessary scheduling caused by scheduling the next backup task of the business object that needs to be canceled, thus improving the next backup task of multiple business objects. scheduling efficiency.
  • Step 303 The backup system sends scheduling instructions to the client according to the scheduling order of the next backup tasks of the multiple business objects.
  • the backup system After the backup system obtains the scheduling order of the next backup task for multiple business objects, it can take the backup task at the head of the queue from the scheduling queue according to the number of parallel threads available in real time and the scheduling order as the backup task to be scheduled, and send it to the backup task.
  • the client sends a scheduling instruction, which includes the identification of the backup task to be scheduled. For example, if the number of parallel threads available in real time is 2, then there can be 2 backup tasks that can currently be scheduled. Therefore, the backup system takes 2 backup tasks from the head of the scheduling queue (for example, Job IDs are 1 and 2) and sends them to the customer.
  • Step 304 The client transmits the backup data of the next backup task of the corresponding business object to the backup system according to the scheduling instruction.
  • the client After receiving the scheduling instruction from the backup system, the client can start the next backup task of the corresponding business object according to the identifier of the business object in the scheduling instruction, that is, transmit the next backup task of the business object indicated by the identifier of the business object to the backup system.
  • the backup data of the backup task is not limited to the identifier of the business object in the scheduling instruction.
  • the client after completing the next backup task of multiple business objects, the client returns to step 301 again and reports the backup data amount and task completion time of the backup task of multiple business objects to the backup system.
  • the client can choose to send the backup data amount and task completion time of each business object backup task.
  • the client can also choose to send the backup data of multiple business objects at once after completing the backup task of multiple business objects. There are no specific restrictions on the backup data volume and task completion time of the backup task.
  • the backup data of the last backup task of multiple business objects is By predicting the volume and task completion time, we can obtain the scheduling sequence of the next backup tasks for multiple business objects. This can fully consider the matching accuracy of the backup system and the dynamic characteristics of each backup task. It can also avoid manual intervention in the scheduling of multiple backup tasks and improve task performance. SLA time window compliance.
  • FIG. 4 is a flow chart of a process 400 of the data backup method according to the embodiment of the present application.
  • Process 400 may be performed by both the front-end client and the back-end backup system.
  • Process 400 is described as a series of steps or operations, and it should be understood that process 400 may be performed in various orders and/or occur simultaneously and is not limited to the order of execution shown in FIG. 4 .
  • Process 400 includes the following steps:
  • Step 401 The client sends the sending rate and receiving rate of the backup task of the second business object in the previous cycle to the backup system.
  • the backup task of the second business object is being executed, that is, in the embodiment of the present application, the speed limit can be implemented for the backup task that is being executed (the client is sending the backup data of the backup task to the backup system), so that the port bandwidth Resources can be optimally used based on the real-time transmission characteristics of tasks, improving system throughput.
  • the backup task of business objects is executed periodically.
  • the backup task being executed can be considered to be in the current cycle (can also be called the next cycle), then the cycle in which the backup task was last executed can be considered Is the previous cycle of this backup task.
  • a rate reporting cycle can be set, and the client reports the second service in the previous reporting cycle to the backup system according to this reporting cycle.
  • the sending rate and receiving rate are regarded as the average sending rate and average receiving rate in the previous reporting period, or they can also be regarded as the weighted value in the previous reporting period.
  • the average sending rate and the weighted average receiving rate are not specifically limited.
  • Step 402 The backup system obtains the rate limit rate of the backup task of the second business object in the next cycle based on the sending rate and the receiving rate.
  • the backup system can input the sending rate and the receiving rate into the third machine learning model to obtain the receiving rate of the backup task of the second business object in the next cycle; and then obtain the receiving rate of the backup task of the second business object in the next cycle. 2. The rate limit of the backup task of the business object in the next cycle.
  • the backup system can obtain the preset bandwidth of the first port, which is used to perform the backup task of the second business object; obtain the sum of the reception rates of all backup tasks transmitted by the first port in the next cycle; according to the first port
  • the rate limit rate of the backup task of the second business object in the next cycle is obtained by the sum of the preset bandwidth, the reception rate of the backup task of the second business object in the next cycle, and the reception rate.
  • the predicted reception rate of backup task i is the reception rate of the backup task of the second business object in the next cycle
  • the sum of the predicted reception rates of all backup tasks transmitted on the first port is the sum of the predicted reception rates of all backup tasks transmitted on the first port. The sum of the receive rates in the next cycle.
  • the embodiment of the present application can also use other methods to calculate the rate limit rate of the backup task of the second business object in the next cycle, which is not specifically limited.
  • Step 403 The backup system sends a rate limit instruction to the client, where the rate limit instruction includes the rate limit rate of the backup task of the second business object in the next cycle.
  • the backup system sends the rate limit rate of the backup task of the second business object in the next cycle through a rate limit indication. to the client, so that the client can limit the sending rate of the backup data of the backup task of transmitting the second business object based on the rate limit rate.
  • Step 404 The client controls the rate limit rate of the backup task of the second business object in the next cycle according to the rate limit indication.
  • the client can limit the sending rate of the backup data of the backup task that transmits the second business object based on the rate limit, so that the port bandwidth resources can be optimally used based on the real-time transmission characteristics of the task and improve the throughput of the system.
  • the rate limit rate of the backup tasks of multiple business objects in the next cycle is predicted based on the sending rate and receiving rate of the backup tasks of multiple business objects in the previous cycle, which can make the port Bandwidth resources can be optimally used based on the real-time transmission characteristics of tasks, improving system throughput. It can also avoid manual intervention in scheduling multiple backup tasks and improve system transmission efficiency.
  • FIG. 5 is a flow chart of a process 500 of the data backup method according to the embodiment of the present application.
  • Process 500 may be performed by a backend backup system.
  • Process 500 is described as a series of steps or operations, and it should be understood that process 500 may be performed in various orders and/or occur simultaneously and is not limited to the order of execution shown in FIG. 5 .
  • Process 500 includes the following steps:
  • Step 501 Obtain training data.
  • Training data is data used to train machine learning models.
  • the training data can be different depending on the structure, parameters, functions, etc. of the machine learning model.
  • Step 502 Train and obtain the target machine learning model based on the training data.
  • the target machine learning model includes at least one of a first machine learning model, a second machine learning model, and a third machine learning model, where the first machine learning model is used to predict the next backup of the business object.
  • the scheduling sequence of tasks, the second machine learning model is used to predict the second backup data volume and the second task completion time of the business object, and the third machine learning model is used to predict the reception rate of the backup task of the business object in the next cycle.
  • the backup system can obtain the historical backup data volume and historical task completion time of multiple business objects.
  • the historical backup data volume and historical task completion time correspond to the completed backup tasks of multiple business objects; obtain the preset machine learning Model; input the historical backup data volume and historical task completion time of multiple business objects into the preset machine learning model to obtain the predicted backup data volume and predicted task completion time of multiple business objects; based on the predicted backup data volume and predicted task completion Time for convergence training to obtain the target machine learning model.
  • the backup system can obtain the historical receiving rate and historical sending rate of completed backup tasks of multiple business objects; obtain the preset machine learning model; input the historical receiving rate and historical sending rate into the preset machine learning model to Obtain the predicted reception rate of backup tasks for multiple business objects; perform convergence training based on the predicted reception rate to obtain the target machine learning model.
  • the preset machine learning model may also be different, including the structure and parameters of the preset machine learning model. This is not the case in the embodiments of the present application. Make specific limitations.
  • the machine learning model can also use the error back propagation (BP) algorithm to correct the size of the parameters in the initial model during the training process, so that the reconstruction error loss of the model becomes smaller and smaller. Specifically, forward propagation of the input signal until the output will produce an error loss, and backward propagation of the error loss information is used to update the parameters in the initial model, so that the error loss converges.
  • the backpropagation algorithm is a backpropagation movement dominated by error loss, aiming to obtain the optimal parameters of the machine learning model, such as the weight matrix.
  • the backup system can be further divided into three modules: scheduling and sorting module, speed limit decision-making module, and machine learning model training module.
  • the machine learning model training module is responsible for cyclical training of the machine learning model based on historical data.
  • the scheduling and sorting module is responsible for taking the execution result of the last backup task as input, calling the machine learning model, and outputting the data to be The scheduling sequence of the scheduled backup tasks, and instructs the client to execute the corresponding backup tasks according to the scheduling sequence;
  • the rate limit decision-making module is responsible for calling the machine learning model for each started backup task, using the real-time feedback sending rate and receiving rate as input. , output the dynamic rate limit of the task, and send the rate limit to the client.
  • FIG 6 is an exemplary schematic diagram of the client configuration and registration process.
  • the client configuration and registration process is divided into two steps: 1 System initialization (for example, business object changes, configuration changes, system power-on etc.), the client sends the model update frequency UpdateFreq to configure the model training cycle of the machine learning model training module; 2 The client sends the basic configuration information of each business object (such as VM) to the scheduling and sorting module, and the information includes the business object Identity (VM Id), earliest task start time (MinStartTime), latest task end time (MaxEndTime), and backup task scheduling cycle (JobFreq).
  • VM Id business object Identity
  • MinStartTime latest task start time
  • MaxEndTime latest task end time
  • JobFreq backup task scheduling cycle
  • This application only requires the client to configure the backup task scheduling cycle JobFreq, the earliest task start time MinStartTime (for example, 8:00PM after get off work), and the latest task end time MaxEndTime (for example, 8AM before work the next day).
  • the scheduling and sorting module will flexibly schedule backup tasks within the backup task scheduling cycle based on the actual execution of each backup task, and optimize the overall scheduling strategy to ensure that as many backup tasks as possible are within the constrained maximum allowed time window [MinStartTime, MaxEndTime] Complete backup.
  • the basic configuration information in step 2 in this application may not include the earliest task start time MinStartTime and the latest task end time MaxEndTime, that is, for the backup task identified by the VM ID, the maximum allowed time window for its execution is not limited.
  • the scheduling and sorting module can select the most suitable time period to schedule the backup task within this cycle (for example, within a day, within a week, within 12 hours, etc.).
  • FIG 7 is an exemplary schematic diagram of the scheduling and sorting calculation process.
  • the scheduling and sorting module first obtains the MinStartTime of the backup task that has not been started, and determines whether MinStartTime ⁇ t is established.
  • t represents the current time. For example, the current time is 9:00PM. If the above inequality is true, it means that the earliest start time of the task set by the backup task is earlier than the current time and can be started; if the above inequality does not hold, it means that the earliest start time of the task set by the backup task is The time is later than the current time, that is, it has not yet reached the start time of the backup task and cannot be started.
  • the scheduling and sorting module predicts the backup data volume volume and task completion time predicted JCT of the next backup of the backup task through the trained machine learning model, and calculates the ratio of volume/predicted JCT as the next backup task The ease of a backup, and then calculate the result of MaxEndTime-predicted JCT–t as the remaining startup time of the next backup of the backup task, which can be used as the urgency of the next backup of the backup task.
  • MaxEndTime-predicted JCT–t ⁇ 0 is true. If the above inequality is not true, it means that the remaining startup time of the next backup of the backup task is a positive number, that is, there is still some time before the next backup of the backup task, that is, the current The time plus the predicted task completion time of the next backup of the backup task has not yet reached the latest end time of the backup task. At this time, the backup task can be scheduled; if the above inequality is true, it means that the next backup time of the backup task The remaining start time of the backup has become a negative number, that is, the next backup of the backup task has missed the start time, that is, the current time plus the predicted task completion time of the next backup of the backup task has passed the backup task The latest end time.
  • next backup of the backup task will be canceled with a probability of 1-p.
  • the probability p can be pre-configured during system initialization. The larger the value of p, the next backup of the backup task will be cancelled. The smaller the possibility, the smaller the value of p, indicating that the next backup of the backup task is more likely to be cancelled.
  • the backup tasks are sorted from large to small according to the above ratios of each backup task, and the scheduling queue for the next backup of all backup tasks to be scheduled can be obtained.
  • the greater the value of urgency indicating that the backup task is not very urgent
  • the smaller the calculated ratio is, the lower the scheduling ranking is, and the smaller the value of urgency (indicating that the backup task is more urgent).
  • the greater the calculated ratio the higher the scheduling ranking.
  • the ease value indicating that it is easier to complete
  • the larger the calculated ratio is the higher the scheduling ranking is.
  • the smaller the value of ease (indicating that it is more complicated to complete) the smaller the calculated ratio is, and the lower the scheduling ranking is.
  • Figure 8 is an exemplary schematic diagram of a real-time scheduling interaction process. As shown in Figure 8, the scheduling and sorting module uses the method shown in Figure 7 to obtain the scheduling order of the next backup of multiple backup tasks to be scheduled.
  • the scheduling and sorting module takes the backup task at the head of the queue from the scheduling queue in order according to the number of parallel threads available in real time and as the backup task to be scheduled, and sends a scheduling instruction to the client.
  • the scheduling instruction includes the backup task to be scheduled.
  • the scheduling and sorting module After starting the backup task, the scheduling and sorting module sends a speed limit start instruction to the speed limit decision-making module.
  • the speed limit start instruction includes the identification of the started backup task (Job Id) and the time (createtime) when the scheduling and sorting module actually starts the task. (Step 2).
  • the speed limit decision-making module After receiving the speed limit start instruction, the speed limit decision-making module obtains the speed limit rate of the started backup task in the next cycle, and then sends the speed limit instruction to the client.
  • the speed limit instruction includes the identifier of the started backup task (Job Id). ) and the rate limit (Ratelimit) (step 3), so that the client performs source-side sending rate limit based on the rate limit rate.
  • the client When the client completes a backup task, it can send task completion measurement feedback to the machine learning model training module.
  • the task completion measurement feedback includes the identification of the completed backup task (Job Id), the number of times the backup task was backed up in the past backup Data volume (volume) and task completion time (JCT) of the backup task in the past backup (step 4). This feedback allows the machine learning model training module to complete updated training of the machine learning model based on the feedback information.
  • FIG 9 is an exemplary schematic diagram of the calculation process of the rate limit.
  • the client performs periodic (for example, every 10ms, every 1s, every minute, every 5 minutes, etc.) (This is not limited), the actual sending and receiving rate measurement feedback is sent to the speed limit decision-making module.
  • the actual sending and receiving rate measurement feedback includes the identification (Job Id), timestamp (timestamp), receiving rate (RxRate) and sending rate of the started backup task. (TxRate)(step 1).
  • RxRate can be the average receiving rate in the previous cycle
  • TxRate can be the average sending rate in the previous cycle.
  • the rate limit decision-making module calls the machine learning model, inputs RxRate and TxRate, and outputs the predicted reception rate of the started task in the next cycle.
  • the speed limit decision-making module calculates the speed limit rate of the backup task in the next cycle according to the following formula:
  • the first port is the transmission port of backup task i.
  • the speed limit decision-making module After obtaining the speed limit rate of backup task i, the speed limit decision-making module sends a speed limit instruction to the client.
  • the speed limit instruction includes the ID of the started backup task (Job Id) and the speed limit rate (Ratelimit) (step 2). This allows the client to perform source-end sending rate limiting based on the rate limiting rate.
  • Figure 10 is an exemplary schematic diagram of the training process of the machine learning model.
  • the machine learning model training module can be performed periodically (for example, every week, every month, every quarter, etc., there is no limit to this).
  • Update the parameters of the machine learning model which may include the structure of the machine learning model, parameters of one or more layers contained in the machine learning model, etc.
  • the machine learning model training module sends model parameter update instructions to the scheduling and sorting module and the speed limit decision-making module respectively.
  • the model parameter update instructions include the updated model parameters (ModelParas) (step 1), so that the scheduling and sorting The module and the rate limit decision module can update the local machine learning model based on the updated model parameters.
  • the client After entering the next cycle, the client periodically (for example, every 10ms, every 1s, every minute, every 5 minutes, etc., there is no limit to this) sends the actual sending and receiving rate to the speed limit decision-making module for the started backup task.
  • Measurement feedback the actual transmission and reception rate measurement feedback includes the identification (Job Id), timestamp (timestamp), reception rate (RxRate) and transmission rate (TxRate) of the started backup task (step 2).
  • RxRate can be the average receiving rate in the previous cycle
  • TxRate can be the average sending rate in the previous cycle.
  • the client When the client completes a backup task, it can send task completion measurement feedback to the machine learning model training module.
  • the task completion measurement feedback includes the identification of the completed backup task (Job Id), the number of times the backup task was backed up in the past backup Data volume (volume) and task completion time (JCT) of the backup task in the past backup (step 3). This feedback allows the machine learning model training module to complete updated training of the machine learning model based on the feedback information.
  • the machine learning model training module can perform training based on the structure of the machine learning model shown in Figure 11.
  • the machine learning model for example, Flow Neural Network (FlowNN), convolutional network, deep network, etc.
  • the machine learning model is a multi-layer neural network model, which mainly consists of the input embedding layer L1 and the path aggregator layer (PathAggregator).
  • L1 mainly implements mapping of each input feature data into a high-dimensional space
  • L2 aggregates all the time information output by L1 and recurses along the time dimension
  • L3 predicts the real-time reception rate of the task based on the output of L1 and L2.
  • Figure 11 comes from X.Cheng et al., "Physics constrained flow neural network for short-timescale predictions in data communications networks", ArXiv, https://arxiv.org/pdf/2112.12321.pdf.
  • the machine learning model training module inputs the backup data volume of backup tasks historically executed by each business object and the task completion time JCT of the backup task into L1. After L2 aggregates all the input information, the output of L2 is finally used directly. Fully connected network mapping predicts backup data volumes and predicts task completion times for multiple business objects. In this process, only L1 and L2 participate in the machine learning model.
  • the machine learning model training module compares the predicted backup data volume and predicted task completion time of multiple business objects with the actual backup data volume and actual task completion time of this backup task for multiple business objects. Based on the difference between the two The loss converges the machine learning model, thereby obtaining the parameters of the updated machine learning model.
  • the machine learning model training module inputs the receiving rate (RxRate) and sending rate (TxRate) of each business object in the previous cycle into L1, and directly uses the outputs of L2 and L3 after calculation by L1, L2, and L3.
  • the fully connected network maps the predicted reception rate of the backup tasks of multiple business objects in the next cycle.
  • the machine learning model training module compares the aforementioned predicted reception rate with the actual reception rate of this backup task for multiple business objects, and converges the machine learning model based on the loss between the two, thereby obtaining the updated machine learning model. parameters.
  • this application can also use other collected relevant task transmission characteristics and system status characteristic data as input to train the machine learning model, without specific limitations.
  • the machine learning model training module again sends model parameter update instructions to the scheduling and sorting module and the speed limit decision-making module respectively.
  • the model parameter update instructions include the updated model parameters (ModelParas) (step 4), so that the scheduling The sorting module and the rate limiting decision module can update the local machine learning model based on the updated model parameters.
  • the client can perform the above steps 2 and 3 multiple times, so that the input data used by the machine learning model training module can be the customer data received within a training cycle. Multiple feedbacks from the end carry data.
  • FIG 12 is an exemplary structural diagram of a data backup device 1200 according to an embodiment of the present application. As shown in Figure 12, the data backup device 1200 according to this embodiment can be applied to a backup system.
  • the data backup device 1200 may include: an acquisition module 1201, a scheduling module 1202, a sending module 1203, a rate limiting module 1204 and a training module 1205. in,
  • Obtaining module 1201 is used to obtain the first backup data volume and the first task completion time of multiple business objects, and the first backup data volume and the first task completion time are consistent with the last backup of the multiple business objects. task correspondence; adjustment The degree module 1202 is used to obtain the scheduling order of the next backup task of the multiple business objects according to the first backup data amount and the first task completion time; the sending module 1203 is used to obtain the scheduling order of the next backup task according to the multiple business objects.
  • the scheduling sequence of the object's next backup task sends a scheduling instruction to the client, where the scheduling instruction includes the identification of the business object corresponding to the backup task to be scheduled.
  • the scheduling module 1202 is specifically configured to input the first backup data amount and the first task completion time into a first machine learning model to obtain the next steps of the multiple business objects.
  • the scheduling sequence of a backup task is specifically configured to input the first backup data amount and the first task completion time into a first machine learning model to obtain the next steps of the multiple business objects.
  • the scheduling module 1202 is specifically configured to input the first backup data amount and the first task completion time into a second machine learning model to obtain the first data of the multiple business objects. Second, the backup data amount and the second task completion time, the second backup data amount and the second task completion time correspond to the next backup task of the multiple business objects; according to the second backup data amount and the second task completion time, The second task completion time is used to obtain the scheduling order of the next backup tasks of the multiple business objects.
  • the scheduling module 1202 is specifically configured to calculate the ratio of the second backup data amount corresponding to the first business object and the second task completion time to obtain the first business object.
  • the ease of the next backup task the first business object is any one of the plurality of business objects; the first business object is obtained according to the second task completion time corresponding to the first business object The remaining startup time of the next backup task; when the ease of the next backup task of the multiple business objects and the remaining startup time of the next backup task of the multiple business objects are both obtained, according to the multiple The ease of the next backup task of the business object and the remaining start time of the next backup task of the multiple business objects are used to obtain the scheduling order of the next backup task of the multiple business objects.
  • the scheduling module 1202 is also used to determine whether the next backup task of the first business object is canceled; when the next backup task of the first business object is not canceled, according to The ease of the next backup task of the multiple business objects and the remaining start time of the next backup task of the multiple business objects are used to obtain the scheduling order of the next backup task of the multiple business objects.
  • the scheduling module 1202 is specifically configured to calculate the ease of the next backup task of the multiple business objects and the remaining startup time of the next backup task of the multiple business objects. ratio to obtain the scheduling threshold of the next backup task of the multiple business objects; sort the thresholds of the next backup task of the multiple business objects in order from large to small to obtain the next scheduling threshold of the multiple business objects.
  • the scheduling sequence of a backup task is specifically configured to calculate the ease of the next backup task of the multiple business objects and the remaining startup time of the next backup task of the multiple business objects. ratio to obtain the scheduling threshold of the next backup task of the multiple business objects; sort the thresholds of the next backup task of the multiple business objects in order from large to small to obtain the next scheduling threshold of the multiple business objects.
  • the scheduling module 1202 is specifically used to determine whether the remaining startup time of the next backup task of the first business object is less than 0; when the next backup task of the first business object When the remaining startup time is less than 0, calculate the cancellation probability of the next backup task of the first business object; when the cancellation probability of the next backup task of the first business object is greater than the preset threshold, determine to cancel the first backup task.
  • the next backup task of a business object is specifically used to determine whether the remaining startup time of the next backup task of the first business object is less than 0; when the next backup task of the first business object When the remaining startup time is less than 0, calculate the cancellation probability of the next backup task of the first business object; when the cancellation probability of the next backup task of the first business object is greater than the preset threshold, determine to cancel the first backup task.
  • the next backup task of a business object is specifically used to determine whether the remaining startup time of the next backup task of the first business object is less than 0; when the next backup task of the first business object When the remaining startup time is less than 0, calculate the
  • the scheduling module 1202 is also configured to determine not to cancel the first business object when the remaining startup time of the next backup task of the first business object is greater than or equal to 0. Next backup task.
  • the scheduling module 1202 is also configured to determine not to cancel the first business object when the cancellation probability of the next backup task of the first business object is less than or equal to a preset threshold. next backup of Task.
  • the rate limiting module 1204 is configured to receive the sending rate and receiving rate of the backup task of the second business object sent by the client in the previous cycle. The task is being executed; the rate limit rate of the backup task of the second business object in the next cycle is obtained according to the sending rate and the receiving rate; the sending module 1203 is also used to send the rate limit to the client.
  • the speed limit indication includes the speed limit rate of the backup task of the second business object in the next cycle.
  • the rate limiting module 1204 is specifically configured to input the sending rate and the receiving rate into a third machine learning model to obtain the backup task of the second business object in the next cycle. the receiving rate; obtain the rate limit rate of the backup task of the second business object in the next cycle according to the receiving rate of the backup task of the second business object in the next cycle.
  • the rate limiting module 1204 is specifically used to obtain the preset bandwidth of the first port, which is used to perform the backup task of the second business object; obtain the third The sum of the reception rates of all backup tasks transmitted by a port in the next cycle; according to the preset bandwidth of the first port, the reception rate of the backup tasks of the second business object in the next cycle, and the sum of the reception rates and obtain the rate limit rate of the backup task of the second business object in the next cycle.
  • the training module 1205 is used to train to obtain a target machine learning model, where the target machine learning model includes at least one of a first machine learning model, a second machine learning model, and a third machine learning model. 1.
  • the first machine learning model is used to predict the scheduling sequence of the next backup task of the business object
  • the second machine learning model is used to predict the second backup data volume and the second task completion time of the business object.
  • the third machine learning model is used to predict the reception rate of the backup task of the business object in the next cycle.
  • the training module 1205 is specifically used to obtain the historical backup data volume and historical task completion time of the multiple business objects.
  • the historical backup data volume and the historical task completion time are sum of The completed backup tasks of the multiple business objects correspond to each other; a preset machine learning model is obtained; and the historical backup data volume and historical task completion time of the multiple business objects are input into the preset machine learning model to obtain the preset machine learning model.
  • the predicted backup data volume and predicted task completion time of the multiple business objects are calculated; convergence training is performed based on the predicted backup data volume and the predicted task completion time to obtain the target machine learning model.
  • the training module 1205 is specifically configured to obtain the historical receiving rate and historical sending rate of completed backup tasks of the multiple business objects; obtain a preset machine learning model; and convert the The historical receiving rate and historical sending rate are input into the preset machine learning model to obtain the predicted receiving rate of the backup tasks of the multiple business objects; convergence training is performed based on the predicted receiving rate to obtain the target machine learning model.
  • the device of this embodiment can be used to execute the technical solution of any of the method embodiments shown in Figures 3 and 4. Its implementation principles and technical effects are similar and will not be described again here.
  • each step of the above method embodiment can be completed through an integrated logic circuit of hardware in the processor or instructions in the form of software.
  • the processor may be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA), or other Programmed logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP digital signal processor
  • ASIC application-specific integrated circuit
  • FPGA field programmable gate array
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.
  • the steps of the methods disclosed in the embodiments of the present application can be directly implemented by a hardware encoding processor, or executed by a combination of hardware and software modules in the encoding processor.
  • Software modules can be located in random access memory, flash memory, only Read memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field.
  • the storage medium is located in the memory, and the processor reads the information in the memory and completes the steps of the above method in combination with its hardware.
  • non-volatile memory may be read-only memory (ROM), programmable ROM (PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically removable memory. Erase electrically programmable read-only memory (EPROM, EEPROM) or flash memory. Volatile memory can be random access memory (RAM), which is used as an external cache.
  • RAM random access memory
  • RAM static random access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • double data rate SDRAM double data rate SDRAM
  • DDR SDRAM double data rate SDRAM
  • ESDRAM enhanced synchronous dynamic random access memory
  • SLDRAM synchronous link dynamic random access memory
  • direct rambus RAM direct rambus RAM
  • the disclosed systems, devices and methods can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or can be integrated into another system, or some features can be ignored, or not implemented.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit.
  • the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请提供一种数据备份方法和装置。本申请数据备份方法,包括:获取多个业务对象的第一备份数据量和第一任务完成时间,所述第一备份数据量和所述第一任务完成时间与所述多个业务对象的上一次备份任务对应;根据所述第一备份数据量和所述第一任务完成时间获取所述多个业务对象的下一次备份任务的调度顺序;根据所述多个业务对象的下一次备份任务的调度顺序向客户端发送调度指示,所述调度指示包括即将调度的备份任务对应的业务对象的标识。本申请可以充分考虑备份系统及各个备份任务的动态特性的匹配精度,并避免人工对多备份任务的调度干预,提高任务的SLA时间窗遵从度克服。

Description

数据备份方法和装置
本申请要求于2022年8月23日提交中国专利局、申请号为202211010615.3、申请名称为“数据备份方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据备份技术,尤其涉及一种数据备份方法和装置。
背景技术
客户端定期会对不同业务对象的数据生成副本,并将这些副本在预定的时间窗口内传给后台的备份系统,即为备份任务。对于众多待传输的备份任务,备份编排的目标是设计一套高效的备份传输资源(传输时间、带宽等)分配及调度方案,将有限的备份资源分配给众多待传输的备份任务,使得尽可能多的备份任务能在预定的时间窗内完成,并最大化整个系统的备份带宽/吞吐。
相关技术中,采用了基于动态调度的备份编排方案,通过动态设置任务优先级和/或任务启动时间来实现,其中,任务优先级由客户设置,任务启动时间是根据系统实时状态动态计算得到。如此一来,该方案的调度效果很大程度受客户人工经验及随机操作的影响,可能会对备份系统及各个备份任务的动态特性的匹配精度不够。此外,在备份任务执行过程中,需要通过抢断式中断操作来自适应任务启动后的动态波动,但抢断式中断操作会导致额外的中断及恢复开销,影响系统的吞吐及任务的服务级别协议(Service Level Agreement,SLA)时间窗遵从。
发明内容
本申请提供一种数据备份方法和装置,以充分考虑备份系统及各个备份任务的动态特性的匹配精度,并避免人工对多备份任务的调度干预,提高任务的SLA时间窗遵从度克服。
第一方面,本申请提供一种数据备份方法,包括:获取多个业务对象的第一备份数据量和第一任务完成时间,所述第一备份数据量和所述第一任务完成时间与所述多个业务对象的上一次备份任务对应;根据所述第一备份数据量和所述第一任务完成时间获取所述多个业务对象的下一次备份任务的调度顺序;根据所述多个业务对象的下一次备份任务的调度顺序向客户端发送调度指示,所述调度指示包括即将调度的备份任务对应的业务对象的标识。
本申请实施例,通过机器学习模型,基于多个业务对象的上一次备份任务的备份数据量和任务完成时间预测得到多个业务对象的下一次备份任务的调度顺序,可以充分考虑备份系统及各个备份任务的动态特性的匹配精度,还可以避免人工对多备份任务的调度干预,提高任务的SLA时间窗遵从度。
客户端的多个业务对象可以包括支持各种操作系统(例如,Windows、Unix、Linux、VMware等)的服务器的业务对象,也可以包括各类数据库(例如,Oracle、SQL、DB2等) 的业务对象,还可以包括其他有数据备份需求的设备或服务的业务对象,本申请实施例对此不做具体限定。应理解,一个客户端可以包括多个业务对象,例如,服务器上创建了多个VM,其中一个VM即为一个业务对象,又例如,数据库创建了针对多个用户的数据库服务,其中一个数据库服务即为一个业务对象。
第一备份数据量和第一任务完成时间与多个业务对象的上一次备份任务对应。业务对象的数据备份需求是一个长期且周期重复性的过程,例如,一个业务对象会以一定的备份频率(例如,每小时、每天、每周、每月等)定期的将生产数据从客户端通过交换网络备份到后台的备份系统,因此业务对象可以将每次的数据传输过程当作一个备份任务。可见业务对象的备份任务是周期性启动并执行的,基于此,本申请实施例中,将下一周期中即将执行的备份任务称作业务对象的下一次备份任务,将该备份任务之前的上一周期中已经执行的备份任务称作业务对象的上一次备份任务。
客户端可以在一个业务对象的所有备份数据都传输完后,即该业务对象的上一次备份任务执行完成后,将该备份任务的备份数据量和任务完成时间(Job Completion Time,JCT)上报给备份系统。这样,在一段时间后,例如,每天结束时、每月结束时,备份系统可以获取到多个业务对象的上一次备份任务的备份数据量和任务完成时间(即第一备份数据量和第一任务完成时间)。
本申请实施例中,备份系统可以采用以下两种方法获取多个业务对象的下一次备份任务的调度顺序:
一、将第一备份数据量和第一任务完成时间输入第一机器学习模型以得到多个业务对象的下一次备份任务的调度顺序。
二、将第一备份数据量和第一任务完成时间输入第二机器学习模型以得到多个业务对象的第二备份数据量和第二任务完成时间,该第二备份数据量和第二任务完成时间与多个业务对象的下一次备份任务对应;根据第二备份数据量和第二任务完成时间获取多个业务对象的下一次备份任务的调度顺序。
在得到多个业务对象的第二备份数据量和第二任务完成时间后,针对多个业务对象中的任意一个业务对象(例如,第一业务对象),备份系统可以计算第一业务对象对应的第二备份数据量和第二任务完成时间的比值以得到第一业务对象的下一次备份任务的容易度。即,备份系统要计算多个业务对象各自的下一次备份任务的容易度,该容易度是第二备份数据量和第二任务完成时间的比值,该比值越大,表示下一次备份任务越容易,该比值越小,表示下一次备份任务越难。
再根据第一业务对象对应的第二任务完成时间获取第一业务对象的下一次备份任务的剩余启动时间。
备份系统可以对任务最晚结束时间MaxEndTime(该信息在业务对象注册或重配置时上报给备份系统)、第二任务完成时间以及当前时间t求差,以得到下一次备份任务的剩余启动时间。
当多个业务对象的下一次备份任务的容易度和多个业务对象的下一次备份任务的剩余启动时间均得到后,根据多个业务对象的下一次备份任务的容易度和多个业务对象的下一次备份任务的剩余启动时间获取多个业务对象的下一次备份任务的调度顺序。
针对第一业务对象,备份系统获取到其下一次备份任务的容易度和下一次备份任务的 剩余启动时间后,可以计算第一业务对象的下一次备份任务的容易度和多个业务对象的下一次备份任务的剩余启动时间的比值,以得到第一业务对象的下一次备份任务的调度阈值。采用相同计算方法,备份系统可以得到所有业务对象的下一次备份任务的调度阈值,调度阈值越大,表示下一次备份任务容易完成且时间紧迫,需尽快启动,调度阈值越小,表示下一次备份任务较难完成且时间宽松,可以暂缓启动。然后将多个业务对象的下一次备份任务的阈值按照从大到小的顺序排序以得到多个业务对象的下一次备份任务的调度顺序。
以上两种方法采用了两个机器学习模型,区别在于机器学习模型的输出不同,这与机器学习模型的训练过程相关,此外机器学习模型还可以输出其他信息,以辅助于获取多个业务对象的下一次备份任务的调度顺序。因此本申请实施例不限定机器学习模型的输出。无论哪一种机器学习模型,在训练时都充分考虑到了系统的吞吐及任务的SLA时间窗遵从,结合备份任务的动态特性(例如,备份任务的备份数据量、任务完成时间、任务最早启动时间、任务最晚结束时间等),对备份系统及各个备份任务的动态特性的匹配精度进行训练,从而使得机器学习模型的预测结果更接近于实际执行结果。
本申请实施例还可以采用除上述另种方法以外的其他方法获取多个业务对象的下一次备份任务的调度顺序,对此也不做具体限定。
在一种可能的实现方式中,备份系统可以先判断第一业务对象的下一次备份任务是否取消;当第一业务对象的下一次备份任务不取消时,根据多个业务对象的下一次备份任务的容易度和多个业务对象的下一次备份任务的剩余启动时间获取多个业务对象的下一次备份任务的调度顺序。
本申请实施例中,备份系统可以判断第一业务对象的下一次备份任务的剩余启动时间是否小于0。当第一业务对象的下一次备份任务的剩余启动时间小于0(表示已经过了第一业务对象的下一次备份任务的最晚结束时间)时,计算第一业务对象的下一次备份任务的取消概率(例如,计算1-p,p为预先设定的概率值,1-p越大表示第一业务对象的下一次备份任务的取消概率越大,就越可能被取消,1-p越小表示第一业务对象的下一次备份任务的取消概率越小,就越可能不被取消);当第一业务对象的下一次备份任务的取消概率大于预设阈值时,确定取消第一业务对象的下一次备份任务;当第一业务对象的下一次备份任务的取消概率小于或等于预设阈值时,确定不取消第一业务对象的下一次备份任务。当第一业务对象的下一次备份任务的剩余启动时间大于或等于0(表示还没到第一业务对象的下一次备份任务的最晚结束时间)时,确定不取消第一业务对象的下一次备份任务。
这样可以只针对不取消的业务对象的下一次备份任务进行调度,避免因调度了需要被取消的业务对象的下一次备份任务,导致的非必要调度,从而提高多个业务对象的下一次备份任务的调度效率。
备份系统在得到多个业务对象的下一次备份任务的调度顺序后,可以按实时可用的并行线程数,按照调度顺序依次从调度队列取队首的备份任务,作为即将调度的备份任务,并向客户端发送调度指示,该调度指示包括即将调度的备份任务的标识。例如,实时可用的并行线程数为2,那么当前可以调度的备份任务可以有2个,因此备份系统从调度队列的队首取2个备份任务(例如,Job Id为1和2),向客户端发送调度指示,包括Job Id=1和Job Id=2。
可选的,客户端在完成多个业务对象的下一次备份任务后,再次向备份系统上报多个 业务对象的该次备份任务的备份数据量和任务完成时间。客户端可以选择每完成一个业务对象的备份任务就发送该备份任务的备份数据量和任务完成时间,客户端也可以选择完成多个业务对象的备份任务后,一次性发送多个业务对象的该次备份任务的备份数据量和任务完成时间,对此不做具体限定。
在一种可能的实现方式中,所述根据所述多个业务对象的下一次备份任务的调度顺序向客户端发送调度指示之后,还包括:接收所述客户端发送的第二业务对象的备份任务在上一个周期内的发送速率和接收速率,所述第二业务对象的备份任务正在执行中;根据所述发送速率和所述接收速率获取所述第二业务对象的备份任务在下一个周期内的限速速率;向所述客户端发送限速指示,所述限速指示包括所述第二业务对象的备份任务在下一个周期内的限速速率。
本申请实施例,通过机器学习模型,基于多个业务对象的备份任务在上一个周期内的发送速率和接收速率预测得到多个业务对象的备份任务在下一个周期内的限速速率,可以使得端口带宽资源能基于任务的实时传输特性得到最佳的使用,提高系统的吞吐量,还可以避免人工对多备份任务的调度干预,提高系统传输效率。
第二业务对象的备份任务正在执行中,即本申请实施例中,可以针对正在执行中的备份任务(客户端正在向备份系统发送该备份任务的备份数据),进行限速,以使得端口带宽资源能基于任务的实时传输特性得到最佳的使用,提高系统的吞吐量。
如上文所述,业务对象的备份任务是周期性执行的,正在执行中的备份任务可以认为是处于当前周期(亦可以称作下一个周期)内,那么上一次执行该备份任务的周期可以认为是该备份任务的上一个周期。
通常备份任务在一个周期内的发送速率和接收速率是动态变化的,因此本申请实施例中可以设置速率上报周期,客户端根据该上报周期向备份系统上报在上一个上报周期内的第二业务对象的备份任务的发送速率和接收速率,该发送速率和接收速率看作是在上一个上报周期内的平均发送速率和平均接收速率,或者,也可以看作是在上一个上报周期内的加权平均发送速率和加权平均接收速率,对此不做具体限定。
备份系统可以将发送速率和接收速率输入第三机器学习模型以得到第二业务对象的备份任务在下一个周期内的接收速率;然后根据第二业务对象的备份任务在下一个周期内的接收速率获取第二业务对象的备份任务在下一个周期内的限速速率。
备份系统可以获取第一端口的预设带宽,第一端口用于执行第二业务对象的备份任务;获取第一端口传输的所有备份任务在下一个周期内的接收速率之和;根据第一端口的预设带宽、第二业务对象的备份任务在下一个周期内的接收速率以及接收速率之和获取第二业务对象的备份任务在下一个周期内的限速速率。表示成公式如下:
其中,备份任务i的预测接收速率为第二业务对象的备份任务在下一个周期内的接收速率,第一端口传输的所有备份任务的预测接收速率之和是在第一端口上传输的所有备份任务在下一个周期内的接收速率之和。
需要说明的是,本申请实施例还可以采用其他方法计算得到第二业务对象的备份任务在下一个周期内的限速速率,对此不做具体限定。
备份系统将第二业务对象的备份任务在下一个周期内的限速速率通过限速指示发送给客户端,这样客户端就可以基于该限速速率对传输第二业务对象的备份任务的备份数据的发送速率进行限制。
客户端可以基于限速速率对传输第二业务对象的备份任务的备份数据的发送速率进行限制,以使得端口带宽资源能基于任务的实时传输特性得到最佳的使用,提高系统的吞吐量。
在一种可能的实现方式中,还包括:训练得到目标机器学习模型,所述目标机器学习模型包括第一机器学习模型、第二机器学习模型和第三机器学习模型中的至少之一,所述第一机器学习模型用于预测业务对象的下一次备份任务的调度顺序,所述第二机器学习模型用于预测业务对象的第二备份数据量和第二任务完成时间,所述第三机器学习模型用于预测业务对象的备份任务在下一个周期内的接收速率。
训练数据是用于训练机器学习模型的数据,根据机器学习模型的结构、参数、功能等的不同,训练数据可以不同。
本申请实施例中,目标机器学习模型包括第一机器学习模型、第二机器学习模型和第三机器学习模型中的至少之一,其中,第一机器学习模型用于预测业务对象的下一次备份任务的调度顺序,第二机器学习模型用于预测业务对象的第二备份数据量和第二任务完成时间,第三机器学习模型用于预测业务对象的备份任务在下一个周期内的接收速率。
可选的,备份系统可以获取多个业务对象的历史备份数据量和历史任务完成时间,历史备份数据量和历史任务完成时间和多个业务对象的已完成备份任务对应;获取预设的机器学习模型;将多个业务对象的历史备份数据量和历史任务完成时间输入预设的机器学习模型以得到多个业务对象的预测备份数据量和预测任务完成时间;基于预测备份数据量和预测任务完成时间进行收敛训练以得到目标机器学习模型。
可选的,备份系统可以获取多个业务对象的已完成备份任务的历史接收速率和历史发送速率;获取预设的机器学习模型;将历史接收速率和历史发送速率输入预设的机器学习模型以得到多个业务对象的备份任务的预测接收速率;基于预测接收速率进行收敛训练以得到目标机器学习模型。
上述两种训练方法,主要区别在于训练数据,可选的,在预设的机器学习模型上也可以有不同,包括预设的机器学习模型的结构、参数等不同,对此本申请实施例不做具体限定。
在训练机器学习模型的过程中,因为希望机器学习模型的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层机器学习模型的权重向量(当然,在第一次更新之前通常会有初始化的过程,即为机器学习模型中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断的调整,直到深度机器学习模型能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数(loss function)或目标函数(objective function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么机器学习模型的训练就变成了尽可能缩小这个loss的过程。
机器学习模型还可以采用误差反向传播(back propagation,BP)算法在训练过程中修正初始的模型中参数的大小,使得模型的重建误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始的模型中参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到最优的机器学习模型的参数,例如权重矩阵。
第二方面,本申请提供一种数据备份装置,包括:获取模块,用于获取多个业务对象的第一备份数据量和第一任务完成时间,所述第一备份数据量和所述第一任务完成时间与所述多个业务对象的上一次备份任务对应;调度模块,用于根据所述第一备份数据量和所述第一任务完成时间获取所述多个业务对象的下一次备份任务的调度顺序;发送模块,用于根据所述多个业务对象的下一次备份任务的调度顺序向客户端发送调度指示,所述调度指示包括即将调度的备份任务对应的业务对象的标识。
在一种可能的实现方式中,所述调度模块,具体用于将所述第一备份数据量和所述第一任务完成时间输入第一机器学习模型以得到所述多个业务对象的下一次备份任务的调度顺序。
在一种可能的实现方式中,所述调度模块,具体用于将所述第一备份数据量和所述第一任务完成时间输入第二机器学习模型以得到所述多个业务对象的第二备份数据量和第二任务完成时间,所述第二备份数据量和所述第二任务完成时间与所述多个业务对象的下一次备份任务对应;根据所述第二备份数据量和所述第二任务完成时间获取所述多个业务对象的下一次备份任务的调度顺序。
在一种可能的实现方式中,所述调度模块,具体用于计算第一业务对象对应的所述第二备份数据量和所述第二任务完成时间的比值以得到所述第一业务对象的下一次备份任务的容易度,所述第一业务对象是所述多个业务对象中的任意一个;根据所述第一业务对象对应的所述第二任务完成时间获取所述第一业务对象的下一次备份任务的剩余启动时间;当所述多个业务对象的下一次备份任务的容易度和所述多个业务对象的下一次备份任务的剩余启动时间均得到后,根据所述多个业务对象的下一次备份任务的容易度和所述多个业务对象的下一次备份任务的剩余启动时间获取所述多个业务对象的下一次备份任务的调度顺序。
在一种可能的实现方式中,所述调度模块,还用于判断所述第一业务对象的下一次备份任务是否取消;当所述第一业务对象的下一次备份任务不取消时,根据所述多个业务对象的下一次备份任务的容易度和所述多个业务对象的下一次备份任务的剩余启动时间获取所述多个业务对象的下一次备份任务的调度顺序。
在一种可能的实现方式中,所述调度模块,具体用于计算所述多个业务对象的下一次备份任务的容易度和所述多个业务对象的下一次备份任务的剩余启动时间的比值以得到所述多个业务对象的下一次备份任务的调度阈值;将所述多个业务对象的下一次备份任务的阈值按照从大到小的顺序排序以得到所述多个业务对象的下一次备份任务的调度顺序。
在一种可能的实现方式中,所述调度模块,具体用于判断所述第一业务对象的下一次备份任务的剩余启动时间是否小于0;当所述第一业务对象的下一次备份任务的剩余启动时间小于0时,计算所述第一业务对象的下一次备份任务的取消概率;当所述第一业务对象的下一次备份任务的取消概率大于预设阈值时,确定取消所述第一业务对象的下一次备 份任务。
在一种可能的实现方式中,所述调度模块,还用于当所述第一业务对象的下一次备份任务的剩余启动时间大于或等于0时,确定不取消所述第一业务对象的下一次备份任务。
在一种可能的实现方式中,所述调度模块,还用于当所述第一业务对象的下一次备份任务的取消概率小于或等于预设阈值时,确定不取消所述第一业务对象的下一次备份任务。
在一种可能的实现方式中,还包括:限速模块,用于接收所述客户端发送的第二业务对象的备份任务在上一个周期内的发送速率和接收速率,所述第二业务对象的备份任务正在执行中;根据所述发送速率和所述接收速率获取所述第二业务对象的备份任务在下一个周期内的限速速率;所述发送模块,还用于向所述客户端发送限速指示,所述限速指示包括所述第二业务对象的备份任务在下一个周期内的限速速率。
在一种可能的实现方式中,所述限速模块,具体用于将所述发送速率和所述接收速率输入第三机器学习模型以得到所述第二业务对象的备份任务在下一个周期内的接收速率;根据所述第二业务对象的备份任务在下一个周期内的接收速率获取所述第二业务对象的备份任务在下一个周期内的限速速率。
在一种可能的实现方式中,所述限速模块,具体用于获取第一端口的预设带宽,所述第一端口用于执行所述第二业务对象的备份任务;获取所述第一端口传输的所有备份任务在下一个周期内的接收速率之和;根据所述第一端口的预设带宽、所述第二业务对象的备份任务在下一个周期内的接收速率以及所述接收速率之和获取所述第二业务对象的备份任务在下一个周期内的限速速率。
在一种可能的实现方式中,还包括:训练模块,用于训练得到目标机器学习模型,所述目标机器学习模型包括第一机器学习模型、第二机器学习模型和第三机器学习模型中的至少之一,所述第一机器学习模型用于预测业务对象的下一次备份任务的调度顺序,所述第二机器学习模型用于预测业务对象的第二备份数据量和第二任务完成时间,所述第三机器学习模型用于预测业务对象的备份任务在下一个周期内的接收速率。
在一种可能的实现方式中,所述训练模块,具体用于获取所述多个业务对象的历史备份数据量和历史任务完成时间,所述历史备份数据量和所述历史任务完成时间和所述多个业务对象的已完成备份任务对应;获取预设的机器学习模型;将所述多个业务对象的历史备份数据量和历史任务完成时间输入所述预设的机器学习模型以得到所述多个业务对象的预测备份数据量和预测任务完成时间;基于所述预测备份数据量和所述预测任务完成时间进行收敛训练以得到所述目标机器学习模型。
在一种可能的实现方式中,所述训练模块,具体用于获取所述多个业务对象的已完成备份任务的历史接收速率和历史发送速率;获取预设的机器学习模型;将所述历史接收速率和历史发送速率输入所述预设的机器学习模型以得到所述多个业务对象的备份任务的预测接收速率;基于所述预测接收速率进行收敛训练以得到所述目标机器学习模型。
第三方面,本申请提供一种备份系统,包括:一个或多个处理器;存储器,用于存储一个或多个程序;当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如上述第一方面中任一项所述的方法。
第四方面,本申请提供一种计算机可读存储介质,包括计算机程序,所述计算机程序在计算机上被执行时,使得所述计算机执行上述第一方面中任一项所述的方法。
第五方面,本申请提供一种计算机程序产品,所述计算机程序产品包括计算机程序代码,当所述计算机程序代码在计算机上运行时,使得计算机执行上述第一方面中任一项所述的方法。
附图说明
图1为备份系统的示例性的框架图;
图2为本申请的系统架构的示例性的结构图;
图3为本申请实施例的数据备份方法的过程300的流程图;
图4为本申请实施例的数据备份方法的过程400的流程图;
图5为本申请实施例的数据备份方法的过程500的流程图;
图6为客户端配置和注册流程的示例性的示意图;
图7为调度排序计算流程的示例性的示意图;
图8为实时调度交互流程的示例性的示意图;
图9为限速速率的计算流程的示例性的示意图;
图10为机器学习模型的训练流程的示例性的示意图;
图11为机器学习模型的示例性的结构图;
图12为本申请实施例数据备份装置1200的一个示例性的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请中的附图,对本申请中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请的说明书实施例和权利要求书及附图中的术语“第一”、“第二”等仅用于区分描述的目的,而不能理解为指示或暗示相对重要性,也不能理解为指示或暗示顺序。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元。方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
应当理解,在本申请中,“至少一个(项)”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,用于描述关联对象的关联关系,表示可以存在三种关系,例如,“A和/或B”可以表示:只存在A,只存在B以及同时存在A和B三种情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b或c中的至少一项(个),可以表示:a,b,c,“a和b”,“a和c”,“b和c”,或“a和b和c”,其中a,b,c可以是单个,也可以是多个。
客户端定期会对不同业务对象的数据生成副本,并将这些副本在预定的时间窗口内传给后台的备份系统,即为备份任务。对于众多待传输的备份任务,备份编排的目标是设计一套高效的备份传输资源(传输时间、带宽等)分配及调度方案,将有限的备份资源分配给众多待传输的备份任务,使得尽可能多的备份任务能在预定的时间窗内完成,并最大化 整个系统的备份带宽/吞吐。
随着信息化技术的飞速发展,信息系统在各种行业的关键业务对象中扮演着越来越重要的角色。在通讯、金融、医疗、电子商务、物流、政府等领域,如果信息系统的业务对象中断,会导致巨大经济损失、影响品牌形象,并可能导致重要数据丢失。因此,各行业普遍通过建设灾备中心来提高关键应用的业务对象连续性,在灾备中心定期的备份生产数据副本。据Gartner预测,各行业的数据量年增长将超过50%,而数据保护预算增长却低于10%。因此在有限的备份资源投入下,如何利用好给定的备份系统,在客户固定有限的可用备份时间窗内,应对增长越来越快的备份数据是二级存储产品的核心竞争力。
各行业的实际生产环境的备份数据复杂多变,多客户/业务对象端产生的不同备份业务对象流到达时间和读写速率高度动态,任务可允许的备份时间窗各也有差异。若不能对众多备份任务进行有效的备份编排,不同任务的业务对象流会在相同时空上产生资源冲突或不必要的资源空闲,导致有效带宽低,服务级别协议(Service Level Agreement,SLA)受损。如果只依赖经验丰富的运营人员对动态生产环境进行人工分析,其运营效率及资源利用率低,无法满足客户的海量业务对象备份需求。
图1为备份系统的示例性的框架图。如图1所示,以虚拟机(Virtual Machine,VM)备份为例,客户端(例如,客户端1、客户端N)部署了一系列的VM(例如,VM1~VM N)以承载不同的应用业务对象。这些VM业务对象会以一定的备份频率(例如,每小时、每天、每周、每月等)定期的将生产数据从客户端通过交换网络备份到后台的备份系统。每个VM业务对象分全量备份(备份所有数据)和增量备份(只备份相比上次备份新增的数据),每次备份任务要传输的备份数据量高度动态且事先不可知。每个备份任务都必须在规定的时间窗内进行备份(例如,只在作业停止后及下一次作业开始前的时间段)。由于每个备份任务的备份数据的存储介质和生产环境等差异,备份数据的客户端读出能力、经网络动态传输以及备份系统的落盘处理速度等都高度动态。备份系统需要对众多的备份任务,在满足各种时间窗口和资源约束条件下,进行高效的调度编排,以使尽可能多的备份任务在规定时间窗内完成备份,并最大化系统备份吞吐。
整个备份过程存在诸多动态不确定的影响因素,备份系统的备份任务编排挑战主要在于如何在诸多动态不确定的因素的影响下做出最佳的智能供需编排决策。动态不确定的因素包括:1)备份任务的总数据量事先不可知;2)客户端将备份数据从本地存储介质读出的速率动态波动(受本地读速率、中央处理器(Central Processing Unit,CPU)负载等影响);3)备份数据实时传输速率受链路带宽拥堵情况、后端接收及输入输出(Input Output,IO)落盘情况动态影响;4)备份任务不断动态的完成退出和新备份任务的进入,使得备份系统的负载和整个备份系统的实时可用剩余资源均动态波动。
现有的备份任务的调度编排策略包括静态编排和动态编排。静态编排主要通过设置固定的任务备份策略来控制备份任务的执行。动态编排则是根据业务对象的动态属性对调度进行动态的自适应调整。但是这些编排策略的调度效果很大程度受客户人工经验及随机操作的影响,可能会对备份系统及各个备份任务的动态特性的匹配精度不够,甚至会导致额外的中断及恢复开销,影响系统的吞吐及任务的SLA时间窗遵从。
为了解决上述技术问题,本申请提供了一种数据备份方法,可以最大化整个备份系统的吞吐,并尽可能调度更多的备份任务,使其在各自的SLA时间窗内完成备份数据的传 输。
为了便于理解,下面先对本申请所使用到的一些名词或术语进行解释说明,该名词或术语也作为发明内容的一部分。
1、神经网络
神经网络(neural network,NN)是机器学习模型,神经网络可以是由神经单元组成的,神经单元可以是指以xs和截距1为输入的运算单元,该运算单元的输出可以为:
其中,s=1、2、……n,n为大于1的自然数,Ws为xs的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),用于将非线性特性引入神经网络中,来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入。激活函数可以是ReLU等非线性函数。神经网络是将许多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部感受野(local receptive field)相连,来提取局部感受野的特征,局部感受野可以是由若干个神经单元组成的区域。
2、多层感知器(multi-layer perception,MLP)
MLP是一种简单的深度神经网络(deep neural network,DNN)(不同层之间是全连接的),也称多层神经网络,可以理解为具有很多层隐含层的神经网络,这里的“很多”并没有特别的度量标准。从DNN按不同层的位置划分,DNN内部的神经网络可以分为三类:输入层,隐含层,输出层。一般来说第一层是输入层,最后一层是输出层,中间的层数都是隐含层。层与层之间是全连接的,也就是说,第i层的任意一个神经元一定与第i+1层的任意一个神经元相连。虽然DNN看起来很复杂,但是就每一层的工作来说,其实并不复杂,简单来说就是如下线性关系表达式:其中,是输入向量,是输出向量,是偏移向量,W是权重矩阵(也称系数),α()是激活函数。每一层仅仅是对输入向量经过如此简单的操作得到输出向量由于DNN层数多,则系数W和偏移向量的数量也就很多了。这些参数在DNN中的定义如下所述:以系数W为例:假设在一个三层的DNN中,第二层的第4个神经元到第三层的第2个神经元的线性系数定义为上标3代表系数W所在的层数,而下标对应的是输出的第三层索引2和输入的第二层索引4。总结就是:第L-1层的第k个神经元到第L层的第j个神经元的系数定义为需要注意的是,输入层是没有W参数的。在深度神经网络中,更多的隐含层让网络更能够刻画现实世界中的复杂情形。理论上而言,参数越多的模型复杂度越高,“容量”也就越大,也就意味着它能完成更复杂的学习任务。训练深度神经网络的也就是学习权重矩阵的过程,其最终目的是得到训练好的深度神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。
3、卷积神经网络
卷积神经网络(convolutional neuron network,CNN)是一种带有卷积结构的深度神经网络,是一种深度学习(deep learning)架构,深度学习架构是指通过机器学习的算法,在不同的抽象层级上进行多个层次的学习。作为一种深度学习架构,CNN是一种前馈(feed-forward)人工神经网络,该前馈人工神经网络中的各个神经元可以对输入其中的图像作出响应。卷积神经网络包含了一个由卷积层和池化层构成的特征抽取器。该特征抽取器可以看作是滤波器,卷积过程可以看作是使用一个可训练的滤波器与一个输入的图像或者卷积特征平面(feature map)做卷积。
卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。卷积层可以包括很多个卷积算子,卷积算子也称为核,其在图像处理中的作用相当于一个从输入图像矩阵中提取特定信息的过滤器,卷积算子本质上可以是一个权重矩阵,这个权重矩阵通常被预先定义,在对图像进行卷积操作的过程中,权重矩阵通常在输入图像上沿着水平方向一个像素接着一个像素(或两个像素接着两个像素……这取决于步长stride的取值)的进行处理,从而完成从图像中提取特定特征的工作。该权重矩阵的大小应该与图像的大小相关,需要注意的是,权重矩阵的纵深维度(depth dimension)和输入图像的纵深维度是相同的,在进行卷积运算的过程中,权重矩阵会延伸到输入图像的整个深度。因此,和一个单一的权重矩阵进行卷积会产生一个单一纵深维度的卷积化输出,但是大多数情况下不使用单一权重矩阵,而是应用多个尺寸(行×列)相同的权重矩阵,即多个同型矩阵。每个权重矩阵的输出被堆叠起来形成卷积图像的纵深维度,这里的维度可以理解为由上面所述的“多个”来决定。不同的权重矩阵可以用来提取图像中不同的特征,例如一个权重矩阵用来提取图像边缘信息,另一个权重矩阵用来提取图像的特定颜色,又一个权重矩阵用来对图像中不需要的噪点进行模糊化等。该多个权重矩阵尺寸(行×列)相同,经过该多个尺寸相同的权重矩阵提取后的特征图的尺寸也相同,再将提取到的多个尺寸相同的特征图合并形成卷积运算的输出。这些权重矩阵中的权重值在实际应用中需要经过大量的训练得到,通过训练得到的权重值形成的各个权重矩阵可以用来从输入图像中提取信息,从而使得卷积神经网络进行正确的预测。当卷积神经网络有多个卷积层的时候,初始的卷积层往往提取较多的一般特征,该一般特征也可以称之为低级别的特征;随着卷积神经网络深度的加深,越往后的卷积层提取到的特征越来越复杂,比如高级别的语义之类的特征,语义越高的特征越适用于待解决的问题。
由于常常需要减少训练参数的数量,因此卷积层之后常常需要周期性的引入池化层,可以是一层卷积层后面跟一层池化层,也可以是多层卷积层后面接一层或多层池化层。在图像处理过程中,池化层的唯一目的就是减少图像的空间大小。池化层可以包括平均池化算子和/或最大池化算子,以用于对输入图像进行采样得到较小尺寸的图像。平均池化算子可以在特定范围内对图像中的像素值进行计算产生平均值作为平均池化的结果。最大池化算子可以在特定范围内取该范围内值最大的像素作为最大池化的结果。另外,就像卷积层中用权重矩阵的大小应该与图像尺寸相关一样,池化层中的运算符也应该与图像的大小相关。通过池化层处理后输出的图像尺寸可以小于输入池化层的图像的尺寸,池化层输出的图像中每个像素点表示输入池化层的图像的对应子区域的平均值或最大值。
在经过卷积层/池化层的处理后,卷积神经网络还不足以输出所需要的输出信息。因为 如前所述,卷积层/池化层只会提取特征,并减少输入图像带来的参数。然而为了生成最终的输出信息(所需要的类信息或其他相关信息),卷积神经网络需要利用神经网络层来生成一个或者一组所需要的类的数量的输出。因此,在神经网络层中可以包括多层隐含层,该多层隐含层中所包含的参数可以根据具体的任务类型的相关训练数据进行预先训练得到,例如该任务类型可以包括图像识别,图像分类,图像超分辨率重建等等。
可选的,在神经网络层中的多层隐含层之后,还包括整个卷积神经网络的输出层,该输出层具有类似分类交叉熵的损失函数,具体用于计算预测误差,一旦整个卷积神经网络的前向传播完成,反向传播就会开始更新前面提到的各层的权重值以及偏差,以减少卷积神经网络的损失,及卷积神经网络通过输出层输出的结果和理想结果之间的误差。
4、循环神经网络
循环神经网络(recurrent neural networks,RNN)是用来处理序列数据的。在传统的神经网络模型中,是从输入层到隐含层再到输出层,层与层之间是全连接的,而对于每一层层内之间的各个节点是无连接的。这种普通的神经网络虽然解决了很多难题,但是却仍然对很多问题却无能无力。例如,你要预测句子的下一个单词是什么,一般需要用到前面的单词,因为一个句子中前后单词并不是独立的。RNN之所以称为循环神经网路,即一个序列当前的输出与前面的输出也有关。具体的表现形式为网络会对前面的信息进行记忆并应用于当前输出的计算中,即隐含层本层之间的节点不再无连接而是有连接的,并且隐含层的输入不仅包括输入层的输出还包括上一时刻隐含层的输出。理论上,RNN能够对任何长度的序列数据进行处理。对于RNN的训练和对传统的CNN或DNN的训练一样。同样使用误差反向传播算法,不过有一点区别:即,如果将RNN进行网络展开,那么其中的参数,如W,是共享的;而如上举例上述的传统神经网络却不是这样。并且在使用梯度下降算法中,每一步的输出不仅依赖当前步的网络,还依赖前面若干步网络的状态。该学习算法称为基于时间的反向传播算法(Back propagation Through Time,BPTT)。
既然已经有了卷积神经网络,为什么还要循环神经网络?原因很简单,在卷积神经网络中,有一个前提假设是:元素之间是相互独立的,输入与输出也是独立的,比如猫和狗。但现实世界中,很多元素都是相互连接的,比如股票随时间的变化,再比如一个人说了:我喜欢旅游,其中最喜欢的地方是云南,以后有机会一定要去。这里填空,人类应该都知道是填“云南”。因为人类会根据上下文的内容进行推断,但如何让机器做到这一步?RNN就应运而生了。RNN旨在让机器像人一样拥有记忆的能力。因此,RNN的输出就需要依赖当前的输入信息和历史的记忆信息。
5、损失函数
在训练深度神经网络的过程中,因为希望深度神经网络的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然,在第一次更新之前通常会有初始化的过程,即为深度神经网络中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断的调整,直到深度神经网络能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数(loss function)或目标函数(objective function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出 值(loss)越高表示差异越大,那么深度神经网络的训练就变成了尽可能缩小这个loss的过程。
6、反向传播算法
卷积神经网络可以采用误差反向传播(back propagation,BP)算法在训练过程中修正初始的超分辨率模型中参数的大小,使得超分辨率模型的重建误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始的超分辨率模型中参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到最优的超分辨率模型的参数,例如权重矩阵。
7、生成式对抗网络
生成式对抗网络(generative adversarial networks,GAN)是一种深度学习模型。该模型中至少包括两个模块:一个模块是生成模型(Generative Model),另一个模块是判别模型(Discriminative Model),通过这两个模块互相博弈学习,从而产生更好的输出。生成模型和判别模型都可以是神经网络,具体可以是深度神经网络,或者卷积神经网络。GAN的基本原理如下:以生成图片的GAN为例,假设有两个网络,G(Generator)和D(Discriminator),其中G是一个生成图片的网络,它接收一个随机的噪声z,通过这个噪声生成图片,记做G(z);D是一个判别网络,用于判别一张图片是不是“真实的”。它的输入参数是x,x代表一张图片,输出D(x)代表x为真实图片的概率,如果为1,就代表130%是真实的图片,如果为0,就代表不可能是真实的图片。在对该生成式对抗网络进行训练的过程中,生成网络G的目标就是尽可能生成真实的图片去欺骗判别网络D,而判别网络D的目标就是尽量把G生成的图片和真实的图片区分开来。这样,G和D就构成了一个动态的“博弈”过程,也即“生成式对抗网络”中的“对抗”。最后博弈的结果,在理想的状态下,G可以生成足以“以假乱真”的图片G(z),而D难以判定G生成的图片究竟是不是真实的,即D(G(z))=0.5。这样就得到了一个优异的生成模型G,它可以用来生成图片。
图2为本申请的系统架构的示例性的结构图,如图2所示,该系统可以运用到各种不同场景的数据备份任务的调度编排应用中,包括VM、数据库、文件等备份场景,对此不做具体限定。
该系统分为前端(客户端)和后端(备份系统)两部分,前端客户端和后端备份系统之间通过网络连接,该网络连接包括但不限于以太网、互联网协议(Internet Protocol,IP)等直连或多跳组网连接,以及用传输控制协议(Transmission Control Protocol,TCP)、远端直接内存访问(Remote Directed Memory Access,RDMA)等数据传输形式。需要说明的是,本申请中的网络连接还可以采用其他连接方式或数据传输形式,对此不做具体限定。
前端客户端可以包括支持各种操作系统(例如,Windows、Unix、Linux、VMware等)的服务器,或者各类数据库(例如,Oracle、SQL、DB2等),除此之外,客户端还可以包括其他有数据备份需求的业务对象,本申请对此不做具体限定。
后端备份系统可以针对不同SLA约束下的众多备份任务进行统一调度编排,在既有的CPU和网络传输资源约束下,对各待传输的备份任务进行调度排序和传输速率控制,以优化系统吞吐和各备份任务的SLA时间窗遵从度。
需要说明的是,图2示例性的示出了本申请的系统架构,但该系统架构并不构成限定, 系统中前端客户端的数量、实现形式、实施功能等,以及后端备份系统的构建方式、部署策略等,均可以采用其他方案实现,对此不作具体限定。
在上述系统架构的基础上,下文将对本申请提供的数据备份方法进行说明。
图3为本申请实施例的数据备份方法的过程300的流程图。过程300可由前端客户端和后端备份系统共同执行。过程300描述为一系列的步骤或操作,应当理解的是,过程300可以以各种顺序执行和/或同时发生,不限于图3所示的执行顺序。过程300包括如下步骤:
步骤301、客户端向备份系统上报多个业务对象的第一备份数据量和第一任务完成时间。
客户端的多个业务对象可以包括支持各种操作系统(例如,Windows、Unix、Linux、VMware等)的服务器的业务对象,也可以包括各类数据库(例如,Oracle、SQL、DB2等)的业务对象,还可以包括其他有数据备份需求的设备或服务的业务对象,本申请实施例对此不做具体限定。应理解,一个客户端可以包括多个业务对象,例如,服务器上创建了多个VM,其中一个VM即为一个业务对象,又例如,数据库创建了针对多个用户的数据库服务,其中一个数据库服务即为一个业务对象。
第一备份数据量和第一任务完成时间与多个业务对象的上一次备份任务对应。业务对象的数据备份需求是一个长期且周期重复性的过程,例如,一个业务对象会以一定的备份频率(例如,每小时、每天、每周、每月等)定期的将生产数据从客户端通过交换网络备份到后台的备份系统,因此业务对象可以将每次的数据传输过程当作一个备份任务。可见业务对象的备份任务是周期性启动并执行的,基于此,本申请实施例中,将下一周期中即将执行的备份任务称作业务对象的下一次备份任务,将该备份任务之前的上一周期中已经执行的备份任务称作业务对象的上一次备份任务。
客户端可以在一个业务对象的所有备份数据都传输完后,即该业务对象的上一次备份任务执行完成后,将该备份任务的备份数据量和任务完成时间(Job Completion Time,JCT)上报给备份系统。这样,在一段时间后,例如,每天结束时、每月结束时,备份系统可以获取到多个业务对象的上一次备份任务的备份数据量和任务完成时间(即第一备份数据量和第一任务完成时间)。
步骤302、备份系统根据第一备份数据量和第一任务完成时间获取多个业务对象的下一次备份任务的调度顺序。
本申请实施例中,备份系统可以采用以下两种方法获取多个业务对象的下一次备份任务的调度顺序:
一、将第一备份数据量和第一任务完成时间输入第一机器学习模型以得到多个业务对象的下一次备份任务的调度顺序。
第一机器学习模型可以预先训练得到,其训练过程可以参照下文实施例。该第一机器学习模型的输入为多个业务对象的第一备份数据量和第一任务完成时间,输出为该多各业务对象的下一次备份任务的调度顺序。例如,业务对象包括VM1、VM2和VM3,将该三个业务对象的第一备份数据量和第一任务完成时间输入第一机器学习模型,经过第一机器学习模型的预测,输出三个业务对象的下一次备份任务的调度顺序VM3→VM2→VM1。需要说明的是,该调度顺序可以以业务对象的标识(例如,VM ID)的排列顺序表示,也 可以以业务对象的备份任务的索引(例如,Job ID)的排列顺序表示,本申请实施例对调度顺序的表示方式不做具体限定。
二、将第一备份数据量和第一任务完成时间输入第二机器学习模型以得到多个业务对象的第二备份数据量和第二任务完成时间,该第二备份数据量和第二任务完成时间与多个业务对象的下一次备份任务对应;根据第二备份数据量和第二任务完成时间获取多个业务对象的下一次备份任务的调度顺序。
第二机器学习模型也可以预先训练得到,其训练过程也可以参照下文实施例。该第二机器学习模型的输入为多个业务对象的第一备份数据量和第一任务完成时间,输出为该多个业务对象的第二备份数据量和第二任务完成时间,第二备份数据量和第二任务完成时间是第二机器学习模型预测得到的业务对象的下一次备份任务的备份数据量和任务完成时间,因下一次备份任务尚未执行,因此第二备份数据量和第二任务完成时间不是实际值,而是预测值。
在得到多个业务对象的第二备份数据量和第二任务完成时间后,针对多个业务对象中的任意一个业务对象(例如,第一业务对象),备份系统可以计算第一业务对象对应的第二备份数据量和第二任务完成时间的比值以得到第一业务对象的下一次备份任务的容易度。即,备份系统要计算多个业务对象各自的下一次备份任务的容易度,该容易度是第二备份数据量和第二任务完成时间的比值,该比值越大,表示下一次备份任务越容易,该比值越小,表示下一次备份任务越难。
再根据第一业务对象对应的第二任务完成时间获取第一业务对象的下一次备份任务的剩余启动时间。
备份系统可以对任务最晚结束时间MaxEndTime(该信息在业务对象注册或重配置时上报给备份系统)、第二任务完成时间以及当前时间t求差,以得到下一次备份任务的剩余启动时间。
当多个业务对象的下一次备份任务的容易度和多个业务对象的下一次备份任务的剩余启动时间均得到后,根据多个业务对象的下一次备份任务的容易度和多个业务对象的下一次备份任务的剩余启动时间获取多个业务对象的下一次备份任务的调度顺序。
针对第一业务对象,备份系统获取到其下一次备份任务的容易度和下一次备份任务的剩余启动时间后,可以计算第一业务对象的下一次备份任务的容易度和多个业务对象的下一次备份任务的剩余启动时间的比值,以得到第一业务对象的下一次备份任务的调度阈值。采用相同计算方法,备份系统可以得到所有业务对象的下一次备份任务的调度阈值,调度阈值越大,表示下一次备份任务容易完成且时间紧迫,需尽快启动,调度阈值越小,表示下一次备份任务较难完成且时间宽松,可以暂缓启动。然后将多个业务对象的下一次备份任务的阈值按照从大到小的顺序排序以得到多个业务对象的下一次备份任务的调度顺序。
以上两种方法采用了两个机器学习模型,区别在于机器学习模型的输出不同,这与机器学习模型的训练过程相关,此外机器学习模型还可以输出其他信息,以辅助于获取多个业务对象的下一次备份任务的调度顺序。因此本申请实施例不限定机器学习模型的输出。无论哪一种机器学习模型,在训练时都充分考虑到了系统的吞吐及任务的SLA时间窗遵从,结合备份任务的动态特性(例如,备份任务的备份数据量、任务完成时间、任务最早启动时间、任务最晚结束时间等),对备份系统及各个备份任务的动态特性的匹配精度进 行训练,从而使得机器学习模型的预测结果更接近于实际执行结果。
本申请实施例还可以采用除上述另种方法以外的其他方法获取多个业务对象的下一次备份任务的调度顺序,对此也不做具体限定。
在一种可能的实现方式中,备份系统可以先判断第一业务对象的下一次备份任务是否取消;当第一业务对象的下一次备份任务不取消时,根据多个业务对象的下一次备份任务的容易度和多个业务对象的下一次备份任务的剩余启动时间获取多个业务对象的下一次备份任务的调度顺序。
本申请实施例中,备份系统可以判断第一业务对象的下一次备份任务的剩余启动时间是否小于0。当第一业务对象的下一次备份任务的剩余启动时间小于0(表示已经过了第一业务对象的下一次备份任务的最晚结束时间)时,计算第一业务对象的下一次备份任务的取消概率(例如,计算1-p,p为预先设定的概率值,1-p越大表示第一业务对象的下一次备份任务的取消概率越大,就越可能被取消,1-p越小表示第一业务对象的下一次备份任务的取消概率越小,就越可能不被取消);当第一业务对象的下一次备份任务的取消概率大于预设阈值时,确定取消第一业务对象的下一次备份任务;当第一业务对象的下一次备份任务的取消概率小于或等于预设阈值时,确定不取消第一业务对象的下一次备份任务。当第一业务对象的下一次备份任务的剩余启动时间大于或等于0(表示还没到第一业务对象的下一次备份任务的最晚结束时间)时,确定不取消第一业务对象的下一次备份任务。
这样可以只针对不取消的业务对象的下一次备份任务进行调度,避免因调度了需要被取消的业务对象的下一次备份任务,导致的非必要调度,从而提高多个业务对象的下一次备份任务的调度效率。
步骤303、备份系统根据多个业务对象的下一次备份任务的调度顺序向客户端发送调度指示。
备份系统在得到多个业务对象的下一次备份任务的调度顺序后,可以按实时可用的并行线程数,按照调度顺序依次从调度队列取队首的备份任务,作为即将调度的备份任务,并向客户端发送调度指示,该调度指示包括即将调度的备份任务的标识。例如,实时可用的并行线程数为2,那么当前可以调度的备份任务可以有2个,因此备份系统从调度队列的队首取2个备份任务(例如,Job Id为1和2),向客户端发送调度指示,包括Job Id=1和Job Id=2。
步骤304、客户端根据调度指示,向备份系统传输对应的业务对象的下一次备份任务的备份数据。
在收到备份系统的调度指示后,客户端可以根据调度指示中的业务对象的标识启动对应的业务对象的下一次备份任务,即向备份系统传输业务对象的标识所指示的业务对象的下一次备份任务的备份数据。
可选的,客户端在完成多个业务对象的下一次备份任务后,再次回到步骤301,向备份系统上报多个业务对象的该次备份任务的备份数据量和任务完成时间。客户端可以选择每完成一个业务对象的备份任务就发送该备份任务的备份数据量和任务完成时间,客户端也可以选择完成多个业务对象的备份任务后,一次性发送多个业务对象的该次备份任务的备份数据量和任务完成时间,对此不做具体限定。
本申请实施例,通过机器学习模型,基于多个业务对象的上一次备份任务的备份数据 量和任务完成时间预测得到多个业务对象的下一次备份任务的调度顺序,可以充分考虑备份系统及各个备份任务的动态特性的匹配精度,还可以避免人工对多备份任务的调度干预,提高任务的SLA时间窗遵从度。
图4为本申请实施例的数据备份方法的过程400的流程图。过程400可由前端客户端和后端备份系统共同执行。过程400描述为一系列的步骤或操作,应当理解的是,过程400可以以各种顺序执行和/或同时发生,不限于图4所示的执行顺序。过程400包括如下步骤:
步骤401、客户端向备份系统发送第二业务对象的备份任务在上一个周期内的发送速率和接收速率。
第二业务对象的备份任务正在执行中,即本申请实施例中,可以针对正在执行中的备份任务(客户端正在向备份系统发送该备份任务的备份数据),进行限速,以使得端口带宽资源能基于任务的实时传输特性得到最佳的使用,提高系统的吞吐量。
如上文所述,业务对象的备份任务是周期性执行的,正在执行中的备份任务可以认为是处于当前周期(亦可以称作下一个周期)内,那么上一次执行该备份任务的周期可以认为是该备份任务的上一个周期。
通常备份任务在一个周期内的发送速率和接收速率是动态变化的,因此本申请实施例中可以设置速率上报周期,客户端根据该上报周期向备份系统上报在上一个上报周期内的第二业务对象的备份任务的发送速率和接收速率,该发送速率和接收速率看作是在上一个上报周期内的平均发送速率和平均接收速率,或者,也可以看作是在上一个上报周期内的加权平均发送速率和加权平均接收速率,对此不做具体限定。
步骤402、备份系统根据发送速率和接收速率获取第二业务对象的备份任务在下一个周期内的限速速率。
备份系统可以将发送速率和接收速率输入第三机器学习模型以得到第二业务对象的备份任务在下一个周期内的接收速率;然后根据第二业务对象的备份任务在下一个周期内的接收速率获取第二业务对象的备份任务在下一个周期内的限速速率。
备份系统可以获取第一端口的预设带宽,第一端口用于执行第二业务对象的备份任务;获取第一端口传输的所有备份任务在下一个周期内的接收速率之和;根据第一端口的预设带宽、第二业务对象的备份任务在下一个周期内的接收速率以及接收速率之和获取第二业务对象的备份任务在下一个周期内的限速速率。表示成公式如下:
其中,备份任务i的预测接收速率为第二业务对象的备份任务在下一个周期内的接收速率,第一端口传输的所有备份任务的预测接收速率之和是在第一端口上传输的所有备份任务在下一个周期内的接收速率之和。
需要说明的是,本申请实施例还可以采用其他方法计算得到第二业务对象的备份任务在下一个周期内的限速速率,对此不做具体限定。
步骤403、备份系统向客户端发送限速指示,该限速指示包括第二业务对象的备份任务在下一个周期内的限速速率。
备份系统将第二业务对象的备份任务在下一个周期内的限速速率通过限速指示发送 给客户端,这样客户端就可以基于该限速速率对传输第二业务对象的备份任务的备份数据的发送速率进行限制。
步骤404、客户端根据限速指示控制第二业务对象的备份任务在下一个周期内的限速速率。
客户端可以基于限速速率对传输第二业务对象的备份任务的备份数据的发送速率进行限制,以使得端口带宽资源能基于任务的实时传输特性得到最佳的使用,提高系统的吞吐量。
本申请实施例,通过机器学习模型,基于多个业务对象的备份任务在上一个周期内的发送速率和接收速率预测得到多个业务对象的备份任务在下一个周期内的限速速率,可以使得端口带宽资源能基于任务的实时传输特性得到最佳的使用,提高系统的吞吐量,还可以避免人工对多备份任务的调度干预,提高系统传输效率。
图5为本申请实施例的数据备份方法的过程500的流程图。过程500可由后端备份系统执行。过程500描述为一系列的步骤或操作,应当理解的是,过程500可以以各种顺序执行和/或同时发生,不限于图5所示的执行顺序。过程500包括如下步骤:
步骤501、获取训练数据。
训练数据是用于训练机器学习模型的数据,根据机器学习模型的结构、参数、功能等的不同,训练数据可以不同。
步骤502、根据训练数据训练得到目标机器学习模型。
本申请实施例中,目标机器学习模型包括第一机器学习模型、第二机器学习模型和第三机器学习模型中的至少之一,其中,第一机器学习模型用于预测业务对象的下一次备份任务的调度顺序,第二机器学习模型用于预测业务对象的第二备份数据量和第二任务完成时间,第三机器学习模型用于预测业务对象的备份任务在下一个周期内的接收速率。
可选的,备份系统可以获取多个业务对象的历史备份数据量和历史任务完成时间,历史备份数据量和历史任务完成时间和多个业务对象的已完成备份任务对应;获取预设的机器学习模型;将多个业务对象的历史备份数据量和历史任务完成时间输入预设的机器学习模型以得到多个业务对象的预测备份数据量和预测任务完成时间;基于预测备份数据量和预测任务完成时间进行收敛训练以得到目标机器学习模型。
可选的,备份系统可以获取多个业务对象的已完成备份任务的历史接收速率和历史发送速率;获取预设的机器学习模型;将历史接收速率和历史发送速率输入预设的机器学习模型以得到多个业务对象的备份任务的预测接收速率;基于预测接收速率进行收敛训练以得到目标机器学习模型。
上述两种训练方法,主要区别在于训练数据,可选的,在预设的机器学习模型上也可以有不同,包括预设的机器学习模型的结构、参数等不同,对此本申请实施例不做具体限定。
在训练机器学习模型的过程中,因为希望机器学习模型的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层机器学习模型的权重向量(当然,在第一次更新之前通常会有初始化的过程,即为机器学习模型中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断的调整,直到深度机器学习模型能够预测出真正想 要的目标值或与真正想要的目标值非常接近的值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数(loss function)或目标函数(objective function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么机器学习模型的训练就变成了尽可能缩小这个loss的过程。
机器学习模型还可以采用误差反向传播(back propagation,BP)算法在训练过程中修正初始的模型中参数的大小,使得模型的重建误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始的模型中参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到最优的机器学习模型的参数,例如权重矩阵。
下文采用几个具体的实施例,对上述方法实施例的技术方案进行详细说明。需要说明的事,下文实施例仅作为一种可能的实施方式说明本申请的技术方案,其并不对本申请的方案构成限定。
根据备份系统实现的功能,可以将备份系统进一步分为三个模块:调度排序模块、限速决策模块、机器学习模型训练模块,其中,机器学习模型训练模块负责基于历史数据对机器学习模型进行周期性训练和模型更新,并将最新训练得到的机器学习模型或模型参数推送给调度排序模块和限速决策模块;调度排序模块负责以上一次备份任务的执行结果为输入,调用机器学习模型,输出待调度的备份任务的调度顺序,并根据调度顺序指示客户端执行对应的备份任务;限速决策模块负责针对每个启动的备份任务,以实时反馈的发送速率和接收速率为输入,调用机器学习模型,输出该任务的动态限速速率,并将限速速率发送给客户端。
图6为客户端配置和注册流程的示例性的示意图,如图6所示,客户端配置和注册流程该流程分为两步:①系统初始化(例如,业务对象变更、配置改变、系统上电等情况下)时由客户端发送模型更新频率UpdateFreq,配置机器学习模型训练模块的模型训练周期;②客户端发送每个业务对象(例如VM)的基本配置信息给调度排序模块,信息包括业务对象标识(VM Id)、任务最早启动时间(MinStartTime)、任务最晚结束时间(MaxEndTime)、备份任务调度周期(JobFreq)。
本申请只要求客户端配置备份任务调度周期JobFreq、任务最早启动时间MinStartTime(例如,下班后8:00PM)和任务最晚结束时间MaxEndTime(例如,第二天上班前8AM)。调度排序模块则会根据各备份任务的实际执行情况在备份任务调度周期内灵活调度备份任务,并优化整体调度编排策略使尽可能多的备份任务在约束的最大允许时间窗[MinStartTime,MaxEndTime]内完成备份。
可选的,本申请中步骤②中的基本配置信息也可以不包含任务最早启动时间MinStartTime和任务最晚结束时间MaxEndTime,即针对VM Id标识的备份任务,不限定其执行的最大允许时间窗,这样调度排序模块可以在本周期(例如,一天内、一周内、12小时内等)内选取最适合的时间段来调度该备份任务。
图7为调度排序计算流程的示例性的示意图,如图7所示,调度排序模块针对未启动的备份任务,先获取其MinStartTime,判断MinStartTime<t是否成立,t表示当前时间, 例如,当前时间为9:00PM,如果前述不等式成立,表示该备份任务设定的任务最早启动时间早于当前时间,可以被启动了;如果前述不等式不成立,表示该备份任务设定的任务最早启动时间晚于当前时间,即还没到该备份任务的启动时间,不可以被启动。
当MinStartTime<t成立时,调度排序模块通过训练得到的机器学习模型预测该备份任务的下一次备份的备份数据量volume和任务完成时间predicted JCT,计算volume/predicted JCT的比值作为该备份任务的下一次备份的容易度,再计算MaxEndTime-predicted JCT–t的结果作为该备份任务的下一次备份的剩余启动时间,以此可以作为该备份任务的下一次备份的紧急度。
判断MaxEndTime-predicted JCT–t<0是否成立,如果前述不等式不成立,表示该备份任务的下一次备份的剩余启动时间为正数,即距离该备份任务的下一次备份还有一些时间,亦即当前时间加上预测的该备份任务的下一次备份的任务完成时间,还没有到该备份任务的最晚结束时间,此时该备份任务可以被调度;如果前述不等式成立,表示该备份任务的下一次备份的剩余启动时间已为负数,即该备份任务的下一次备份已经错过了启动时间,亦即当前时间加上预测的该备份任务的下一次备份的任务完成时间,已经过了该备份任务的最晚结束时间,此时以1-p的概率取消该备份任务的下一次备份,概率p可以在系统初始化时预先配置,p的取值越大,表示该备份任务的下一次备份被取消的可能性越小,p的取值越小,表示该备份任务的下一次备份被取消的可能性越大。
只要该备份任务没有被取消,计算该备份任务的下一次备份的容易度和该备份任务的下一次备份的紧急度的比值,即计算
在得到所有待调度的备份任务的上述比值后,按照各个备份任务的上述比值的大小从大到小对备份任务进行排序,即可得到所有待调度的备份任务的下一次备份的调度队列。例如,同一容易度下,紧急度的值越大(表示备份任务不是很紧急),计算得到的比值越小,其调度排序越靠后,紧急度的值越小(表示备份任务较为紧急),计算得到的比值越大,其调度排序越靠前;又例如,同一紧急度下,容易度的值越大(表示完成起来较为容易),计算得到的比值越大,其调度排序越靠前,容易度的值越小(表示完成起来较为复杂),计算得到的比值越小,其调度排序越靠后。
图8为实时调度交互流程的示例性的示意图,如图8所示,调度排序模块采用图7所示的方法获取待调度的多个备份任务的下一次备份的调度顺序。
基于此,调度排序模块按实时可用的并行线程数,按照调度顺序依次从调度队列取队首的备份任务,作为即将调度的备份任务,并向客户端发送调度指示,该调度指示包括即将调度的备份任务的标识(Job Id)(步骤①),以使得客户端启动该备份任务的备份数据传输。例如,实时可用的并行线程数为2,那么当前可以调度的备份任务可以有2个,因此调度排序模块从调度队列的队首取2个备份任务(例如,Job Id为1和2),向客户端发送调度指示,包括Job Id=1和Job Id=2。
在启动备份任务后,调度排序模块向限速决策模块发送限速启动指示,该限速启动指示包括已启动备份任务的标识(Job Id)和调度排序模块实际启动该任务的时间(createtime) (步骤②)。
限速决策模块收到限速启动指示后,获取该已启动备份任务在下一个周期内的限速速率,然后向客户端发送限速指示,该限速指示包括已启动备份任务的标识(Job Id)和限速速率(Ratelimit)(步骤③),以使得客户端根据该限速速率进行源端发送限速。
当客户端完成一个备份任务后,可以向机器学习模型训练模块发送任务完成测量反馈,该任务完成测量反馈包括已完成备份任务的标识(Job Id)、该备份任务在过去的一次备份中共备份的数据量(volume)和该备份任务在过去的一次备份中的任务完成时间(JCT)(步骤④)。该反馈可以使得机器学习模型训练模块基于这些反馈的信息完成对机器学习模型的更新训练。
图9为限速速率的计算流程的示例性的示意图,如图9所示,客户端针对已启动的备份任务,周期性(例如,每10ms、每1s、每分钟、每5分钟等,对此不做限定)的向限速决策模块发送实际收发速率测量反馈,该实际收发速率测量反馈包括已启动备份任务的标识(Job Id)、时间戳(timestamp)、接收速率(RxRate)和发送速率(TxRate)(步骤①)。其中,RxRate可以是在上一个周期内的平均接收速率,TxRate可以是在上一个周期内的平均发送速率。
限速决策模块调用机器学习模型,输入RxRate和TxRate,输出已启动任务在下一个周期内的预测接收速率。
针对上述已启动备份任务,限速决策模块按如下公式计算该备份任务在下一个周期内的限速速率:
其中,第一端口是备份任务i的传输端口。
限速决策模块在得到备份任务i的限速速率后,向客户端发送限速指示,该限速指示包括已启动备份任务的标识(Job Id)和限速速率(Ratelimit)(步骤②),以使得客户端根据该限速速率进行源端发送限速。
图10为机器学习模型的训练流程的示例性的示意图,如图10所示,机器学习模型训练模块可以周期性(例如,每周、每月、每个季度等,对此不做限定)的更新机器学习模型的参数,该参数可以包括机器学习模型的结构、机器学习模型包含的一层或多层的参数等。
上一个周期结束时,机器学习模型训练模块向调度排序模块和限速决策模块分别发送模型参数更新指示,该模型参数更新指示包括更新后的模型参数(ModelParas)(步骤①),以使得调度排序模块和限速决策模块可以根据更新后的模型参数更新本地的机器学习模型。
进入下一个周期后,客户端针对已启动的备份任务,周期性(例如,每10ms、每1s、每分钟、每5分钟等,对此不做限定)的向限速决策模块发送实际收发速率测量反馈,该实际收发速率测量反馈包括已启动备份任务的标识(Job Id)、时间戳(timestamp)、接收速率(RxRate)和发送速率(TxRate)(步骤②)。其中,RxRate可以是在上一个周期内的平均接收速率,TxRate可以是在上一个周期内的平均发送速率。该反馈可以使得机器学习模型训练模块基于这些反馈的信息完成对机器学习模型的更新训练。
当客户端完成一个备份任务后,可以向机器学习模型训练模块发送任务完成测量反馈,该任务完成测量反馈包括已完成备份任务的标识(Job Id)、该备份任务在过去的一次备份中共备份的数据量(volume)和该备份任务在过去的一次备份中的任务完成时间(JCT)(步骤③)。该反馈可以使得机器学习模型训练模块基于这些反馈的信息完成对机器学习模型的更新训练。
机器学习模型训练模块在训练机器学习模型时,可以基于图11所示的机器学习模型的结构进行训练。如图11所示,机器学习模型(例如,流神经网络(Flow Neural Network,FlowNN)、卷积网络、深度网络等)是个多层神经网络模型,主要由输入嵌入层L1、路径聚合层(PathAggregator)L2、感应层(Induction layer)L3组成。L1主要实现将输入的各特征数据映射到高维空间,L2则将所有L1输出的各时刻信息进行聚合并沿着时间维度递归,L3则基于L1和L2的输出预测任务的实时接收速率。需要说明的是,图11来源于X.Cheng et al.,“Physics constrained flow neural network for short-timescale predictions in data communications networks”,ArXiv,https://arxiv.org/pdf/2112.12321.pdf。
可选的,机器学习模型训练模块将各业务对象历史执行的备份任务的备份数据量volume和备份任务的任务完成时间JCT输入L1,经L2将所有输入信息聚合后,最后将L2的输出直接利用全连接网络映射为多个业务对象的预测备份数据量和预测任务完成时间。该过程中机器学习模型只有L1和L2参与。机器学习模型训练模块将前述多个业务对象的预测备份数据量和预测任务完成时间,与多个业务对象的本次备份任务的实际备份数据量和实际任务完成时间进行比较,基于两者之间的损失对机器学习模型进行收敛,从而得到更新后的机器学习模型的参数。
可选的,机器学习模型训练模块将各业务对象在上一个周期内的接收速率(RxRate)和发送速率(TxRate)输入L1,经L1、L2、L3的计算后将L2、L3的输出直接利用全连接网络映射为多个业务对象的备份任务在下一周期内的预测接收速率。机器学习模型训练模块将前述预测接收速率,与多个业务对象的本次备份任务的实际接收速率进行比较,基于两者之间的损失对机器学习模型进行收敛,从而得到更新后的机器学习模型的参数。
需要说明的是,除了上述两种模型训练方法外,本申请还可以采用其它可采集的相关任务传输特性及系统状态特征数据作为输入训练机器学习模型,对此不做具体限定。
下一个周期结束时,机器学习模型训练模块再次向调度排序模块和限速决策模块分别发送模型参数更新指示,该模型参数更新指示包括更新后的模型参数(ModelParas)(步骤④),以使得调度排序模块和限速决策模块可以根据更新后的模型参数更新本地的机器学习模型。
需要说明的是,在一个机器学习模型的训练周期内,客户端可以多次执行上述步骤②和步骤③,这样机器学习模型训练模块所使用的输入数据可以是在一个训练周期内接收到的客户端的多个反馈携带的数据。
图12为本申请实施例数据备份装置1200的一个示例性的结构示意图,如图12所示,本实施例的数据备份装置1200可以应用于备份系统。该数据备份装置1200可以包括:获取模块1201、调度模块1202、发送模块1203、限速模块1204以及训练模块1205。其中,
获取模块1201,用于获取多个业务对象的第一备份数据量和第一任务完成时间,所述第一备份数据量和所述第一任务完成时间与所述多个业务对象的上一次备份任务对应;调 度模块1202,用于根据所述第一备份数据量和所述第一任务完成时间获取所述多个业务对象的下一次备份任务的调度顺序;发送模块1203,用于根据所述多个业务对象的下一次备份任务的调度顺序向客户端发送调度指示,所述调度指示包括即将调度的备份任务对应的业务对象的标识。
在一种可能的实现方式中,所述调度模块1202,具体用于将所述第一备份数据量和所述第一任务完成时间输入第一机器学习模型以得到所述多个业务对象的下一次备份任务的调度顺序。
在一种可能的实现方式中,所述调度模块1202,具体用于将所述第一备份数据量和所述第一任务完成时间输入第二机器学习模型以得到所述多个业务对象的第二备份数据量和第二任务完成时间,所述第二备份数据量和所述第二任务完成时间与所述多个业务对象的下一次备份任务对应;根据所述第二备份数据量和所述第二任务完成时间获取所述多个业务对象的下一次备份任务的调度顺序。
在一种可能的实现方式中,所述调度模块1202,具体用于计算第一业务对象对应的所述第二备份数据量和所述第二任务完成时间的比值以得到所述第一业务对象的下一次备份任务的容易度,所述第一业务对象是所述多个业务对象中的任意一个;根据所述第一业务对象对应的所述第二任务完成时间获取所述第一业务对象的下一次备份任务的剩余启动时间;当所述多个业务对象的下一次备份任务的容易度和所述多个业务对象的下一次备份任务的剩余启动时间均得到后,根据所述多个业务对象的下一次备份任务的容易度和所述多个业务对象的下一次备份任务的剩余启动时间获取所述多个业务对象的下一次备份任务的调度顺序。
在一种可能的实现方式中,所述调度模块1202,还用于判断所述第一业务对象的下一次备份任务是否取消;当所述第一业务对象的下一次备份任务不取消时,根据所述多个业务对象的下一次备份任务的容易度和所述多个业务对象的下一次备份任务的剩余启动时间获取所述多个业务对象的下一次备份任务的调度顺序。
在一种可能的实现方式中,所述调度模块1202,具体用于计算所述多个业务对象的下一次备份任务的容易度和所述多个业务对象的下一次备份任务的剩余启动时间的比值以得到所述多个业务对象的下一次备份任务的调度阈值;将所述多个业务对象的下一次备份任务的阈值按照从大到小的顺序排序以得到所述多个业务对象的下一次备份任务的调度顺序。
在一种可能的实现方式中,所述调度模块1202,具体用于判断所述第一业务对象的下一次备份任务的剩余启动时间是否小于0;当所述第一业务对象的下一次备份任务的剩余启动时间小于0时,计算所述第一业务对象的下一次备份任务的取消概率;当所述第一业务对象的下一次备份任务的取消概率大于预设阈值时,确定取消所述第一业务对象的下一次备份任务。
在一种可能的实现方式中,所述调度模块1202,还用于当所述第一业务对象的下一次备份任务的剩余启动时间大于或等于0时,确定不取消所述第一业务对象的下一次备份任务。
在一种可能的实现方式中,所述调度模块1202,还用于当所述第一业务对象的下一次备份任务的取消概率小于或等于预设阈值时,确定不取消所述第一业务对象的下一次备份 任务。
在一种可能的实现方式中,限速模块1204,用于接收所述客户端发送的第二业务对象的备份任务在上一个周期内的发送速率和接收速率,所述第二业务对象的备份任务正在执行中;根据所述发送速率和所述接收速率获取所述第二业务对象的备份任务在下一个周期内的限速速率;所述发送模块1203,还用于向所述客户端发送限速指示,所述限速指示包括所述第二业务对象的备份任务在下一个周期内的限速速率。
在一种可能的实现方式中,所述限速模块1204,具体用于将所述发送速率和所述接收速率输入第三机器学习模型以得到所述第二业务对象的备份任务在下一个周期内的接收速率;根据所述第二业务对象的备份任务在下一个周期内的接收速率获取所述第二业务对象的备份任务在下一个周期内的限速速率。
在一种可能的实现方式中,所述限速模块1204,具体用于获取第一端口的预设带宽,所述第一端口用于执行所述第二业务对象的备份任务;获取所述第一端口传输的所有备份任务在下一个周期内的接收速率之和;根据所述第一端口的预设带宽、所述第二业务对象的备份任务在下一个周期内的接收速率以及所述接收速率之和获取所述第二业务对象的备份任务在下一个周期内的限速速率。
在一种可能的实现方式中,训练模块1205,用于训练得到目标机器学习模型,所述目标机器学习模型包括第一机器学习模型、第二机器学习模型和第三机器学习模型中的至少之一,所述第一机器学习模型用于预测业务对象的下一次备份任务的调度顺序,所述第二机器学习模型用于预测业务对象的第二备份数据量和第二任务完成时间,所述第三机器学习模型用于预测业务对象的备份任务在下一个周期内的接收速率。
在一种可能的实现方式中,所述训练模块1205,具体用于获取所述多个业务对象的历史备份数据量和历史任务完成时间,所述历史备份数据量和所述历史任务完成时间和所述多个业务对象的已完成备份任务对应;获取预设的机器学习模型;将所述多个业务对象的历史备份数据量和历史任务完成时间输入所述预设的机器学习模型以得到所述多个业务对象的预测备份数据量和预测任务完成时间;基于所述预测备份数据量和所述预测任务完成时间进行收敛训练以得到所述目标机器学习模型。
在一种可能的实现方式中,所述训练模块1205,具体用于获取所述多个业务对象的已完成备份任务的历史接收速率和历史发送速率;获取预设的机器学习模型;将所述历史接收速率和历史发送速率输入所述预设的机器学习模型以得到所述多个业务对象的备份任务的预测接收速率;基于所述预测接收速率进行收敛训练以得到所述目标机器学习模型。
本实施例的装置,可以用于执行图3-图4任一所示方法实施例的技术方案,其实现原理和技术效果类似,此处不再赘述。
在实现过程中,上述方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。处理器可以是通用处理器、数字信号处理器(digital signal processor,DSP)、特定应用集成电路(application-specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。本申请实施例公开的方法的步骤可以直接体现为硬件编码处理器执行完成,或者用编码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只 读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。
上述各实施例中提及的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。应注意,本文描述的系统和方法的存储器旨在包括但不限于这些和任意其它适合类型的存储器。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存 储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (33)

  1. 一种数据备份方法,其特征在于,包括:
    获取多个业务对象的第一备份数据量和第一任务完成时间,所述第一备份数据量和所述第一任务完成时间与所述多个业务对象的上一次备份任务对应;
    根据所述第一备份数据量和所述第一任务完成时间获取所述多个业务对象的下一次备份任务的调度顺序;
    根据所述多个业务对象的下一次备份任务的调度顺序向客户端发送调度指示,所述调度指示包括即将调度的备份任务对应的业务对象的标识。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述第一备份数据量和所述第一任务完成时间获取所述多个业务对象的下一次备份任务的调度顺序,包括:
    将所述第一备份数据量和所述第一任务完成时间输入第一机器学习模型以得到所述多个业务对象的下一次备份任务的调度顺序。
  3. 根据权利要求1或2所述的方法,其特征在于,所述根据所述第一备份数据量和所述第一任务完成时间获取所述多个业务对象的下一次备份任务的调度顺序,包括:
    将所述第一备份数据量和所述第一任务完成时间输入第二机器学习模型以得到所述多个业务对象的第二备份数据量和第二任务完成时间,所述第二备份数据量和所述第二任务完成时间与所述多个业务对象的下一次备份任务对应;
    根据所述第二备份数据量和所述第二任务完成时间获取所述多个业务对象的下一次备份任务的调度顺序。
  4. 根据权利要求3所述的方法,其特征在于,所述根据所述第二备份数据量和所述第二任务完成时间获取所述多个业务对象的下一次备份任务的调度顺序,包括:
    计算第一业务对象对应的所述第二备份数据量和所述第二任务完成时间的比值以得到所述第一业务对象的下一次备份任务的容易度,所述第一业务对象是所述多个业务对象中的任意一个;
    根据所述第一业务对象对应的所述第二任务完成时间获取所述第一业务对象的下一次备份任务的剩余启动时间;
    当所述多个业务对象的下一次备份任务的容易度和所述多个业务对象的下一次备份任务的剩余启动时间均得到后,根据所述多个业务对象的下一次备份任务的容易度和所述多个业务对象的下一次备份任务的剩余启动时间获取所述多个业务对象的下一次备份任务的调度顺序。
  5. 根据权利要求4所述的方法,其特征在于,所述根据所述多个业务对象的下一次备份任务的容易度和所述多个业务对象的下一次备份任务的剩余启动时间获取所述多个业务对象的下一次备份任务的调度顺序之前,还包括:
    判断所述第一业务对象的下一次备份任务是否取消;
    所述根据所述多个业务对象的下一次备份任务的容易度和所述多个业务对象的下一次备份任务的剩余启动时间获取所述多个业务对象的下一次备份任务的调度顺序,包括:
    当所述第一业务对象的下一次备份任务不取消时,根据所述多个业务对象的下一次备份任务的容易度和所述多个业务对象的下一次备份任务的剩余启动时间获取所述多个业 务对象的下一次备份任务的调度顺序。
  6. 根据权利要求4或5所述的方法,其特征在于,所述根据所述多个业务对象的下一次备份任务的容易度和所述多个业务对象的下一次备份任务的剩余启动时间获取所述多个业务对象的下一次备份任务的调度顺序,包括:
    计算所述多个业务对象的下一次备份任务的容易度和所述多个业务对象的下一次备份任务的剩余启动时间的比值以得到所述多个业务对象的下一次备份任务的调度阈值;
    将所述多个业务对象的下一次备份任务的阈值按照从大到小的顺序排序以得到所述多个业务对象的下一次备份任务的调度顺序。
  7. 根据权利要求5所述的方法,其特征在于,所述判断所述第一业务对象的下一次备份任务是否取消,包括:
    判断所述第一业务对象的下一次备份任务的剩余启动时间是否小于0;
    当所述第一业务对象的下一次备份任务的剩余启动时间小于0时,计算所述第一业务对象的下一次备份任务的取消概率;
    当所述第一业务对象的下一次备份任务的取消概率大于预设阈值时,确定取消所述第一业务对象的下一次备份任务。
  8. 根据权利要求7所述的方法,其特征在于,所述判断所述第一业务对象的下一次备份任务的剩余启动时间是否小于0之后,还包括:
    当所述第一业务对象的下一次备份任务的剩余启动时间大于或等于0时,确定不取消所述第一业务对象的下一次备份任务。
  9. 根据权利要求7所述的方法,其特征在于,所述计算所述第一业务对象的下一次备份任务的取消概率之后,还包括:
    当所述第一业务对象的下一次备份任务的取消概率小于或等于预设阈值时,确定不取消所述第一业务对象的下一次备份任务。
  10. 根据权利要求1-9中任一项所述的方法,其特征在于,所述根据所述多个业务对象的下一次备份任务的调度顺序向客户端发送调度指示之后,还包括:
    接收所述客户端发送的第二业务对象的备份任务在上一个周期内的发送速率和接收速率,所述第二业务对象的备份任务正在执行中;
    根据所述发送速率和所述接收速率获取所述第二业务对象的备份任务在下一个周期内的限速速率;
    向所述客户端发送限速指示,所述限速指示包括所述第二业务对象的备份任务在下一个周期内的限速速率。
  11. 根据权利要求10所述的方法,其特征在于,所述根据所述发送速率和所述接收速率获取所述第二业务对象的备份任务在下一个周期内的限速速率,包括:
    将所述发送速率和所述接收速率输入第三机器学习模型以得到所述第二业务对象的备份任务在下一个周期内的接收速率;
    根据所述第二业务对象的备份任务在下一个周期内的接收速率获取所述第二业务对象的备份任务在下一个周期内的限速速率。
  12. 根据权利要求11所述的方法,其特征在于,所述根据所述第二业务对象的备份任务在下一个周期内的接收速率获取所述第二业务对象的备份任务在下一个周期内的限速 速率,包括:
    获取第一端口的预设带宽,所述第一端口用于执行所述第二业务对象的备份任务;
    获取所述第一端口传输的所有备份任务在下一个周期内的接收速率之和;
    根据所述第一端口的预设带宽、所述第二业务对象的备份任务在下一个周期内的接收速率以及所述接收速率之和获取所述第二业务对象的备份任务在下一个周期内的限速速率。
  13. 根据权利要求1-12中任一项所述的方法,其特征在于,还包括:
    训练得到目标机器学习模型,所述目标机器学习模型包括第一机器学习模型、第二机器学习模型和第三机器学习模型中的至少之一,所述第一机器学习模型用于预测业务对象的下一次备份任务的调度顺序,所述第二机器学习模型用于预测业务对象的第二备份数据量和第二任务完成时间,所述第三机器学习模型用于预测业务对象的备份任务在下一个周期内的接收速率。
  14. 根据权利要求13所述的方法,其特征在于,所述训练得到目标机器学习模型,包括:
    获取所述多个业务对象的历史备份数据量和历史任务完成时间,所述历史备份数据量和所述历史任务完成时间和所述多个业务对象的已完成备份任务对应;
    获取预设的机器学习模型;
    将所述多个业务对象的历史备份数据量和历史任务完成时间输入所述预设的机器学习模型以得到所述多个业务对象的预测备份数据量和预测任务完成时间;
    基于所述预测备份数据量和所述预测任务完成时间进行收敛训练以得到所述目标机器学习模型。
  15. 根据权利要求13或14所述的方法,其特征在于,所述训练得到目标机器学习模型,包括:
    获取所述多个业务对象的已完成备份任务的历史接收速率和历史发送速率;
    获取预设的机器学习模型;
    将所述历史接收速率和历史发送速率输入所述预设的机器学习模型以得到所述多个业务对象的备份任务的预测接收速率;
    基于所述预测接收速率进行收敛训练以得到所述目标机器学习模型。
  16. 一种数据备份装置,其特征在于,包括:
    获取模块,用于获取多个业务对象的第一备份数据量和第一任务完成时间,所述第一备份数据量和所述第一任务完成时间与所述多个业务对象的上一次备份任务对应;
    调度模块,用于根据所述第一备份数据量和所述第一任务完成时间获取所述多个业务对象的下一次备份任务的调度顺序;
    发送模块,用于根据所述多个业务对象的下一次备份任务的调度顺序向客户端发送调度指示,所述调度指示包括即将调度的备份任务对应的业务对象的标识。
  17. 根据权利要求16所述的装置,其特征在于,所述调度模块,具体用于将所述第一备份数据量和所述第一任务完成时间输入第一机器学习模型以得到所述多个业务对象的下一次备份任务的调度顺序。
  18. 根据权利要求16或17所述的装置,其特征在于,所述调度模块,具体用于将所 述第一备份数据量和所述第一任务完成时间输入第二机器学习模型以得到所述多个业务对象的第二备份数据量和第二任务完成时间,所述第二备份数据量和所述第二任务完成时间与所述多个业务对象的下一次备份任务对应;根据所述第二备份数据量和所述第二任务完成时间获取所述多个业务对象的下一次备份任务的调度顺序。
  19. 根据权利要求18所述的装置,其特征在于,所述调度模块,具体用于计算第一业务对象对应的所述第二备份数据量和所述第二任务完成时间的比值以得到所述第一业务对象的下一次备份任务的容易度,所述第一业务对象是所述多个业务对象中的任意一个;根据所述第一业务对象对应的所述第二任务完成时间获取所述第一业务对象的下一次备份任务的剩余启动时间;当所述多个业务对象的下一次备份任务的容易度和所述多个业务对象的下一次备份任务的剩余启动时间均得到后,根据所述多个业务对象的下一次备份任务的容易度和所述多个业务对象的下一次备份任务的剩余启动时间获取所述多个业务对象的下一次备份任务的调度顺序。
  20. 根据权利要求19所述的装置,其特征在于,所述调度模块,还用于判断所述第一业务对象的下一次备份任务是否取消;当所述第一业务对象的下一次备份任务不取消时,根据所述多个业务对象的下一次备份任务的容易度和所述多个业务对象的下一次备份任务的剩余启动时间获取所述多个业务对象的下一次备份任务的调度顺序。
  21. 根据权利要求19或20所述的装置,其特征在于,所述调度模块,具体用于计算所述多个业务对象的下一次备份任务的容易度和所述多个业务对象的下一次备份任务的剩余启动时间的比值以得到所述多个业务对象的下一次备份任务的调度阈值;将所述多个业务对象的下一次备份任务的阈值按照从大到小的顺序排序以得到所述多个业务对象的下一次备份任务的调度顺序。
  22. 根据权利要求20所述的装置,其特征在于,所述调度模块,具体用于判断所述第一业务对象的下一次备份任务的剩余启动时间是否小于0;当所述第一业务对象的下一次备份任务的剩余启动时间小于0时,计算所述第一业务对象的下一次备份任务的取消概率;当所述第一业务对象的下一次备份任务的取消概率大于预设阈值时,确定取消所述第一业务对象的下一次备份任务。
  23. 根据权利要求22所述的装置,其特征在于,所述调度模块,还用于当所述第一业务对象的下一次备份任务的剩余启动时间大于或等于0时,确定不取消所述第一业务对象的下一次备份任务。
  24. 根据权利要求22所述的装置,其特征在于,所述调度模块,还用于当所述第一业务对象的下一次备份任务的取消概率小于或等于预设阈值时,确定不取消所述第一业务对象的下一次备份任务。
  25. 根据权利要求16-24中任一项所述的装置,其特征在于,还包括:
    限速模块,用于接收所述客户端发送的第二业务对象的备份任务在上一个周期内的发送速率和接收速率,所述第二业务对象的备份任务正在执行中;根据所述发送速率和所述接收速率获取所述第二业务对象的备份任务在下一个周期内的限速速率;
    所述发送模块,还用于向所述客户端发送限速指示,所述限速指示包括所述第二业务对象的备份任务在下一个周期内的限速速率。
  26. 根据权利要求25所述的装置,其特征在于,所述限速模块,具体用于将所述发送 速率和所述接收速率输入第三机器学习模型以得到所述第二业务对象的备份任务在下一个周期内的接收速率;根据所述第二业务对象的备份任务在下一个周期内的接收速率获取所述第二业务对象的备份任务在下一个周期内的限速速率。
  27. 根据权利要求26所述的装置,其特征在于,所述限速模块,具体用于获取第一端口的预设带宽,所述第一端口用于执行所述第二业务对象的备份任务;获取所述第一端口传输的所有备份任务在下一个周期内的接收速率之和;根据所述第一端口的预设带宽、所述第二业务对象的备份任务在下一个周期内的接收速率以及所述接收速率之和获取所述第二业务对象的备份任务在下一个周期内的限速速率。
  28. 根据权利要求16-27中任一项所述的装置,其特征在于,还包括:
    训练模块,用于训练得到目标机器学习模型,所述目标机器学习模型包括第一机器学习模型、第二机器学习模型和第三机器学习模型中的至少之一,所述第一机器学习模型用于预测业务对象的下一次备份任务的调度顺序,所述第二机器学习模型用于预测业务对象的第二备份数据量和第二任务完成时间,所述第三机器学习模型用于预测业务对象的备份任务在下一个周期内的接收速率。
  29. 根据权利要求28所述的装置,其特征在于,所述训练模块,具体用于获取所述多个业务对象的历史备份数据量和历史任务完成时间,所述历史备份数据量和所述历史任务完成时间和所述多个业务对象的已完成备份任务对应;获取预设的机器学习模型;将所述多个业务对象的历史备份数据量和历史任务完成时间输入所述预设的机器学习模型以得到所述多个业务对象的预测备份数据量和预测任务完成时间;基于所述预测备份数据量和所述预测任务完成时间进行收敛训练以得到所述目标机器学习模型。
  30. 根据权利要求28或29所述的装置,其特征在于,所述训练模块,具体用于获取所述多个业务对象的已完成备份任务的历史接收速率和历史发送速率;获取预设的机器学习模型;将所述历史接收速率和历史发送速率输入所述预设的机器学习模型以得到所述多个业务对象的备份任务的预测接收速率;基于所述预测接收速率进行收敛训练以得到所述目标机器学习模型。
  31. 一种备份系统,其特征在于,包括:
    一个或多个处理器;
    存储器,用于存储一个或多个程序;
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-15中任一项所述的方法。
  32. 一种计算机可读存储介质,其特征在于,包括计算机程序,所述计算机程序在计算机上被执行时,使得所述计算机执行权利要求1-15中任一项所述的方法。
  33. 一种计算机程序产品,其特征在于,所述计算机程序产品包括计算机程序代码,当所述计算机程序代码在计算机上运行时,使得计算机执行权利要求1-15中任一项所述的方法。
PCT/CN2023/100112 2022-08-23 2023-06-14 数据备份方法和装置 Ceased WO2024041119A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP23856211.0A EP4564173A4 (en) 2022-08-23 2023-06-14 DATA BACKUP APPARATUS AND METHOD
US19/060,320 US20250208953A1 (en) 2022-08-23 2025-02-21 Data backup method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211010615.3A CN117687834A (zh) 2022-08-23 2022-08-23 数据备份方法和装置
CN202211010615.3 2022-08-23

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US19/060,320 Continuation US20250208953A1 (en) 2022-08-23 2025-02-21 Data backup method and apparatus

Publications (1)

Publication Number Publication Date
WO2024041119A1 true WO2024041119A1 (zh) 2024-02-29

Family

ID=90012362

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/100112 Ceased WO2024041119A1 (zh) 2022-08-23 2023-06-14 数据备份方法和装置

Country Status (4)

Country Link
US (1) US20250208953A1 (zh)
EP (1) EP4564173A4 (zh)
CN (1) CN117687834A (zh)
WO (1) WO2024041119A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119226047A (zh) * 2024-12-02 2024-12-31 成都云祺科技有限公司 基于人工智能的备份数据预测方法、系统及报告生成方法
WO2025228104A1 (zh) * 2024-04-28 2025-11-06 华为技术有限公司 一种备份任务分配方法、装置以及设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105302647A (zh) * 2015-11-06 2016-02-03 南京信息工程大学 一种MapReduce中备份任务推测执行策略的优化方案
WO2018076889A1 (zh) * 2016-10-25 2018-05-03 广东欧珀移动通信有限公司 数据备份的方法、装置、系统、存储介质及服务器
CN112685224A (zh) * 2019-10-17 2021-04-20 伊姆西Ip控股有限责任公司 任务管理的方法、设备和计算机程序产品
CN113076224A (zh) * 2021-05-07 2021-07-06 中国工商银行股份有限公司 数据备份方法、数据备份系统、电子设备及可读存储介质
CN114860160A (zh) * 2022-04-15 2022-08-05 北京科杰科技有限公司 一种针对Hadoop数据平台的扩容资源预测方法及系统

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8566287B2 (en) * 2010-01-29 2013-10-22 Hewlett-Packard Development Company, L.P. Method and apparatus for scheduling data backups
US8924667B2 (en) * 2011-10-03 2014-12-30 Hewlett-Packard Development Company, L.P. Backup storage management
US8914663B2 (en) * 2012-03-28 2014-12-16 Hewlett-Packard Development Company, L.P. Rescheduling failed backup jobs
US11061780B1 (en) * 2019-10-08 2021-07-13 EMC IP Holding Company LLC Applying machine-learning to optimize the operational efficiency of data backup systems
CN112685170B (zh) * 2019-10-18 2023-12-08 伊姆西Ip控股有限责任公司 备份策略的动态优化
CN112988497B (zh) * 2019-12-13 2024-05-31 伊姆西Ip控股有限责任公司 管理备份系统的方法、电子设备和计算机程序产品
US11604676B2 (en) * 2020-06-23 2023-03-14 EMC IP Holding Company LLC Predictive scheduled backup system and method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105302647A (zh) * 2015-11-06 2016-02-03 南京信息工程大学 一种MapReduce中备份任务推测执行策略的优化方案
WO2018076889A1 (zh) * 2016-10-25 2018-05-03 广东欧珀移动通信有限公司 数据备份的方法、装置、系统、存储介质及服务器
CN112685224A (zh) * 2019-10-17 2021-04-20 伊姆西Ip控股有限责任公司 任务管理的方法、设备和计算机程序产品
CN113076224A (zh) * 2021-05-07 2021-07-06 中国工商银行股份有限公司 数据备份方法、数据备份系统、电子设备及可读存储介质
CN114860160A (zh) * 2022-04-15 2022-08-05 北京科杰科技有限公司 一种针对Hadoop数据平台的扩容资源预测方法及系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4564173A4

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2025228104A1 (zh) * 2024-04-28 2025-11-06 华为技术有限公司 一种备份任务分配方法、装置以及设备
CN119226047A (zh) * 2024-12-02 2024-12-31 成都云祺科技有限公司 基于人工智能的备份数据预测方法、系统及报告生成方法

Also Published As

Publication number Publication date
CN117687834A (zh) 2024-03-12
EP4564173A1 (en) 2025-06-04
US20250208953A1 (en) 2025-06-26
EP4564173A4 (en) 2025-11-19

Similar Documents

Publication Publication Date Title
Liu et al. Hastening stream offloading of inference via multi-exit DNNs in mobile edge computing
WO2022228204A1 (zh) 一种联邦学习方法以及装置
CN116743635B (zh) 一种网络预测与调控方法及网络调控系统
US20250208953A1 (en) Data backup method and apparatus
US11693392B2 (en) System for manufacturing dispatching using deep reinforcement and transfer learning
CN114511042A (zh) 一种模型的训练方法、装置、存储介质及电子装置
CN112532530A (zh) 一种拥塞通知信息调整的方法及设备
WO2023185825A1 (zh) 调度方法、第一计算节点、第二计算节点以及调度系统
CN120321304B (zh) 结合云边协同的资源调度优化方法及系统
CN115829263A (zh) 作业调度方法、装置、设备和存储介质
CN119088547A (zh) 端边云协同智能系统中的自适应资源优化与模型泛化方法
CN115690544B (zh) 多任务学习方法及装置、电子设备和介质
CN116506310A (zh) 一种基于自动机器学习的路由器流量识别系统及方法
CN116723354A (zh) 基于多智能体强化学习的分布式边缘协同视频分析方法
CN121034080A (zh) 一种基于OpenHarmony的高速公路边缘计算流量调控方法
CN119363679A (zh) 异构计算资源自适应配置与分配方法和装置、存储介质
CN112738225B (zh) 基于人工智能的边缘计算方法
CN116436980A (zh) 一种实时视频任务端网边协同调度方法及装置
CN121157053B (zh) 三通一脑机器人协同控制架构系统及控制方法
Lei et al. Design of a cloud robotics visual platform
Kafle et al. Automation of computational resource control of cyber-physical systems with machine learning
WO2025079795A1 (en) Method and apparatus for federated learning
US12401597B1 (en) Systems and methods for communication between remote environments
US20250068968A1 (en) Dynamic embedding-based machine learning training mechanism for efficient and agile integration of new information
Wang AutoHPCNet: A Deep Reinforcement Learning Framework for Adaptive Resource Scheduling in High-Performance Computing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23856211

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023856211

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2023856211

Country of ref document: EP

Effective date: 20250225

NENP Non-entry into the national phase

Ref country code: DE

WWP Wipo information: published in national office

Ref document number: 2023856211

Country of ref document: EP