WO2022227719A1 - 数据备份方法、系统及相关设备 - Google Patents
数据备份方法、系统及相关设备 Download PDFInfo
- Publication number
- WO2022227719A1 WO2022227719A1 PCT/CN2022/072427 CN2022072427W WO2022227719A1 WO 2022227719 A1 WO2022227719 A1 WO 2022227719A1 CN 2022072427 W CN2022072427 W CN 2022072427W WO 2022227719 A1 WO2022227719 A1 WO 2022227719A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- cluster
- data
- control device
- standby
- service
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operations
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1458—Management of the backup or restore process
- G06F11/1464—Management of the backup or restore process for networked environments
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operations
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1448—Management of the data involved in backup or backup restore
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operations
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operations
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1448—Management of the data involved in backup or backup restore
- G06F11/1451—Management of the data involved in backup or backup restore by selection of backup contents
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operations
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1458—Management of the backup or restore process
- G06F11/1461—Backup scheduling policy
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/202—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
- G06F11/2023—Failover techniques
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2201/00—Indexing scheme relating to error detection, to error correction, and to monitoring
- G06F2201/84—Using snapshots, i.e. a logical point-in-time copy of the data
Definitions
- the present application relates to the technical field of big data, and in particular, to a data backup method, system and related equipment.
- the big data platform implements disaster recovery for user business data through a data backup system, which includes a primary cluster and a backup cluster.
- the main cluster can use components to process user's business data, such as using components to encapsulate and store the user's business data, etc. Different components can process different types of business data of the same user.
- the main cluster can use component 1 to store the user's audio, video, pictures and other business data, and use component 2 to store the user's business data in tabular form.
- the active cluster can periodically back up the business data processed by each component to the standby site, so that after the active cluster fails, the standby cluster can continue to provide business services to users based on the backed up business data. Therefore, how to prevent the business data backed up to the standby cluster from affecting the quality of the business service provided after the disaster recovery switchover has become an urgent problem to be solved at present.
- the embodiment of the present application provides a data backup method, which, during data backup, performs data backup with business as the granularity, so as to avoid business service errors caused by inconsistent data backup, and ensure the business of backing up to the standby cluster. Data does not affect the quality of business services provided.
- the present application also provides corresponding data backup systems, control devices, computing devices, computer-readable storage media, and computer program products.
- an embodiment of the present application provides a data backup method, which is applied to a data backup system including a master cluster, a backup cluster, and a control device.
- the control device controls the master cluster or the backup device according to a first data backup policy.
- the cluster backs up multiple data sets related to the first service in the primary cluster at the first moment to the standby cluster, wherein the first data backup policy includes information of the multiple data sets related to the first service and the first time.
- data backup can be implemented between the primary cluster and the standby cluster with business as the granularity, so that the multiple data sets backed up to the standby cluster related to the first business can be consistent in the time dimension.
- the standby cluster can restore the business operation based on the business data in the same time period, so as to prevent the data backup system from providing business services because the business data backed up to the standby cluster is inconsistent in the time dimension. Therefore, the reliability of the data backup system for storing business data for users can be improved, and the quality of business services can be improved.
- the specific method may be: The control device sends a first instruction to the master cluster to instruct the master cluster to send data corresponding to the snapshots of the multiple data sets related to the first service at the first moment to the backup cluster. Or, the control device sends a second instruction to the standby cluster to instruct the standby cluster to copy data corresponding to the snapshots of multiple data sets related to the first service in the primary cluster at the first moment from the primary cluster. In this way, the control device can control the active cluster or the standby cluster to implement the data backup process according to the snapshot by sending an instruction to the active cluster or to the standby cluster.
- the control device may first send a third instruction to the master cluster, where the third instruction includes multiple commands related to the first service.
- the information that is the data and the first moment are used to instruct the master cluster to acquire snapshots of multiple data sets related to the first service at the first moment.
- the subsequent active cluster or standby cluster can back up multiple data sets related to the first service in the active cluster to the standby cluster according to the snapshot corresponding to the first moment, and back up multiple data sets on the standby cluster. are multiple data sets related to the first service in the main cluster at the first moment.
- control device may also send a fourth instruction to the primary cluster, where the fourth instruction is used to instruct the primary cluster to synchronize user data to the standby cluster; or, the control device may obtain information on the primary cluster and the standby cluster
- the user data stored in the primary cluster is adjusted according to the user data stored in the primary cluster, so that the user data stored in the primary cluster and the secondary cluster are consistent.
- the standby cluster takes over the services on the primary cluster, it can provide users with corresponding business services based on the user data backed up to the standby cluster, thus eliminating the need for operation and maintenance personnel to manually configure user data on the standby cluster. In this way, not only the operation and maintenance cost of the operation and maintenance personnel can be reduced, but also the recovery time objective of the data backup system can be effectively reduced.
- the above-mentioned user data may be, for example, at least one of a user ID, a user permission, and a tenant ID, or other data related to the user.
- the control device may configure not only a first data backup policy for the first service, but also a second data backup policy for the second service, where the second data backup policy includes data related to the second service. information of multiple data sets and the second time; then, the control device controls the active cluster or the standby cluster to back up multiple data sets related to the second service in the active cluster at the second time to the standby cluster according to the second data backup policy. cluster.
- the second service and the first service belong to different services, and may specifically be different services belonging to the same user, or may be different services belonging to different users, or the like. In this way, the data backup system can implement data backup according to business granularity for multiple different services, thereby realizing high-quality services supporting multiple services.
- the plurality of data sets related to the first service include data sets processed or stored by the first component in the main cluster and data sets stored or stored by the second component in the main cluster.
- the first component and the second component may be used for packaging into different formats, or there are differences in the performance of the first component and the second component for processing data, and the like. In this way, different data sets belonging to the same service processed or stored by different components in the primary cluster at the first moment can be backed up to the standby cluster.
- the multiple data sets related to the second service that are backed up to the standby cluster in the data backup system may include data sets processed or stored by the first component in the primary cluster, and data sets stored or stored by the second component in the primary cluster. and data sets stored or processed by third components in the main cluster, etc. There can be differences in the components that process or store datasets for different businesses.
- the control device includes an active client and a standby client, wherein the active client is used to detect the first state information of the active cluster, and the standby client is used to detect the second state information of the standby cluster , the control device can also obtain the first status information detected by the primary client and the second status information detected by the standby client, and when the first status information indicates that the primary cluster is the secondary identity or the cluster fails (for example, the primary cluster is due to When the second status information indicates that the standby cluster is the master identity, the control device determines that the standby client is the client accessed by the application. In this way, the control device can automatically switch the clients accessing the cluster when the active and standby identities of the active cluster and the standby cluster are reversed, thereby eliminating the need for manual switching by operation and maintenance personnel.
- the first status information acquired by the control device may indicate that the primary cluster is the primary identity
- the second status information acquired by the control device may indicate that the standby cluster is the secondary identity
- control device may also prompt the user with information about the failure of the primary cluster, so that the user determines that the primary cluster is faulty based on the prompt, so that the control device may respond to the user's identity adjustment operation for the standby cluster,
- the identity of the standby cluster is adjusted from the standby identity to the primary identity.
- control device is deployed in isolation from the main cluster.
- control device and the standby cluster may be deployed on the standby site, while the main cluster is deployed on the primary site. Due to the isolation deployment, when the main cluster fails, the control device will not fail and can be used for failover.
- the same clock source is set in the control device, the master cluster and the backup cluster.
- the time when the control device controls the master cluster or the backup cluster to perform data backup is different from the time when the master cluster or the backup cluster actually performs the backup.
- the time is consistent, so as to avoid the problem of data backup errors due to inconsistent clock sources. Improve the time consistency of data backup.
- the active cluster and/or the standby cluster include clusters constructed based on the hadoop architecture.
- the present application provides a data backup method, which is applied to a data backup system, where the data backup system includes a master cluster, a backup cluster, and a control device.
- the master cluster acquires an instruction issued by the control device, wherein the instruction summarizes the information including multiple data sets related to the first service and the first moment, so that the master cluster, according to the instruction, assigns the master cluster at the first moment to the Multiple data sets related to the first service in the cluster are backed up to the standby cluster.
- data backup can be implemented between the main cluster and the standby cluster with business as the granularity, so that the multiple data sets backed up to the standby cluster related to the first business can be consistent in the time dimension.
- the master cluster when it backs up multiple data sets related to the first service in the master cluster at the first moment to the standby cluster according to the instruction, it may specifically be based on the multiple data sets related to the first service.
- the information of the data set and the first moment, the snapshot of the multiple data sets related to the first service in the main cluster at the first moment is obtained, so that the main cluster according to the snapshot, the data corresponding to the snapshot (that is, the data corresponding to the first service
- the multiple data sets related to the first service are sent to the standby cluster, so that multiple data sets related to the first service in the primary cluster at the first moment are backed up to the standby cluster.
- the primary cluster may also back up user data to the secondary cluster.
- the standby cluster when it takes over the services on the primary cluster, it can provide users with corresponding business services based on the user data backed up to the standby cluster, thus eliminating the need for operation and maintenance personnel to manually configure user data on the standby cluster. In this way, not only the operation and maintenance cost of the operation and maintenance personnel can be reduced, but also the recovery time objective of the data backup system can be effectively reduced.
- the above-mentioned user data may be, for example, at least one of a user ID, a user permission, and a tenant ID, or other data related to the user.
- the active cluster and/or the standby cluster include clusters constructed based on the hadoop architecture.
- the present application provides a control device, the control device is located in a data backup system, the data backup system further includes a master cluster and a backup cluster, and the control device includes: a control module for controlling the master cluster according to a first data backup policy The or standby cluster backs up multiple data sets related to the first service in the primary cluster at the first moment to the standby cluster, wherein the first data backup policy includes information about the multiple data sets related to the first service and the first moment. .
- control module is specifically configured to: send a first instruction to the master cluster, instructing the master cluster to send data corresponding to snapshots of multiple data sets related to the first service at the first moment to the backup The cluster, or, sends a second instruction to the standby cluster, instructing the standby cluster to copy the data corresponding to the snapshots of the multiple data sets related to the first service in the primary cluster at the first moment from the primary cluster.
- control device further includes a configuration module configured to configure a second data backup policy for the second service, where the second data backup policy includes information on multiple data sets related to the second service and a second data backup policy. time; the control module is further configured to control the main cluster or the standby cluster to back up multiple data sets related to the second service in the main cluster at the second time to the standby cluster according to the second data backup policy.
- control device further includes a prompting module and an adjusting module; the prompting module is used to prompt the user for information about the failure of the primary cluster; the adjusting module is used to respond to the user's identity adjustment operation for the standby cluster, The identity of the standby cluster is adjusted from the standby identity to the master identity.
- the present application provides a primary cluster, where the primary cluster is located in a data backup system, the data backup system further includes a backup cluster and a control device, the primary cluster includes: a communication module for acquiring an instruction issued by the control device, wherein , the instruction includes the information of multiple data sets related to the first service and the first moment; the backup module is used for backing up the multiple data sets related to the first service in the primary cluster at the first moment to the backup module according to the instruction cluster.
- the present application provides a data backup system, where the data backup system includes a control device, a master cluster, and a backup cluster.
- the control device is configured to execute the data backup method as in the first aspect or any implementation manner of the first aspect
- the master cluster is configured to execute the data backup as in the second aspect or any implementation manner of the second aspect method
- the standby cluster is used to obtain and store the data set backed up from the primary cluster.
- the present application provides a master cluster, wherein the master cluster includes at least one processor and at least one memory, and the at least one processor executes instructions stored in the at least one memory, so that the master cluster executes an instruction stored in the at least one memory.
- the present application provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, when the computer-readable storage medium runs on at least one computing device, the at least one computing device executes the above-mentioned second aspect Or the data backup method described in any implementation manner of the second aspect.
- the present application provides a computer program product containing instructions, which, when running on a computing device, enables the computing device to execute the data backup method described in the first aspect or any implementation manner of the first aspect .
- the present application may further combine to provide more implementation manners.
- FIG. 1 is a schematic structural diagram of a data backup system 100
- FIG. 7 is a schematic diagram of a policy configuration interface provided by an embodiment of the present application.
- FIG. 10 is a schematic flowchart of another data backup method provided by an embodiment of the present application.
- FIG. 11 is a schematic structural diagram of a control device provided by an embodiment of the application.
- FIG. 12 is a schematic structural diagram of a primary cluster provided by an embodiment of the present application.
- FIG. 13 is a schematic diagram of a hardware structure of a control device provided by an embodiment of the application.
- FIG. 14 is a schematic diagram of a hardware structure of a master cluster according to an embodiment of the present application.
- the data backup system 100 includes a master cluster 101 and a backup cluster 102 .
- the active cluster 101 and the standby cluster 102 may be implemented by at least one device (such as a server, a virtual machine, a container, a storage device, etc.).
- both the primary cluster 101 and the secondary cluster 102 may be implemented by a cluster including multiple servers (eg, a cluster constructed based on a hadoop architecture, etc.).
- the active cluster 101 and the standby cluster 102 may also be implemented by a single device, respectively.
- a user can access the main cluster 101 through an application server (specifically, a client in the application server), and the access includes reading and writing data related to one or more services of the user.
- the data related to a service is simply referred to as service data below.
- the master cluster 101 may periodically back up the service data saved by the user on the master cluster 101 to the backup cluster 102 . In this way, when the main cluster 101 fails, the backup cluster 102 can continue to provide corresponding business services for the user 104 by using the backed up business data, so as to improve the reliability of the data backup system 100 storing business data for the user 104 .
- the master cluster 101 may include a hadoop distributed file system (HDFS) component 1011 and a Hive component 1012, as shown in FIG. 2 .
- the HDFS component is used to store files, and each file is stored as a series of data blocks, which can provide high-throughput data access and is suitable for large-scale data sets;
- the Hive component is used to extract, transform and Loading to store and query data in a cluster built on the Hadoop architecture.
- the main cluster 101 may use multiple components to store the service data of each user in different formats.
- the HDFS component 1011 is used to encapsulate the user's audio, video, pictures and other business data into a file format and store it in the main cluster 101, and the business data in the file format is stored in a corresponding directory, for example, the main cluster 101 through the HDFS component
- the business data belonging to user 106 stored in 1011 is located under directory 3 (the business data under directory 1 and target 2 may be business data of other users).
- the Hive component 1012 is used to encapsulate the user's business data into structured data, and store it in the main cluster 101 in a tabular form. And the service data in Table 2 can be the service data of other users).
- the standby cluster 102 may include an HDFS component 1021 and a Hive component 1022, and the functions of the components on the standby cluster 102 are similar to those on the main cluster 101, which will not be repeated here.
- the replication task 1 and the replication task 2 are respectively started, and these replication tasks are executed by creating corresponding processes, wherein the replication task 1 is used to back up the data related to the HDFS component 1011 (including: directory 1, directory 2, and directory 3) to the storage area corresponding to the HDFS component 1021, and the copy task 2 is used to back up the data related to the Hive component 1012 (including: Table 1, Table 2, and Table 3) are backed up to the storage area corresponding to the Hive component 1022.
- the above-mentioned component-based data backup can usually only keep the business data on a single component between the main cluster 101 and the standby cluster 102 consistent, while the data backed up to the multiple components of the standby cluster 102 is in the time dimension. may not be consistent.
- the replication task 1 and the replication task 2 may be at different times.
- data related to a component may be, for example, data packaged or processed by the component, or data stored by the component.
- the main cluster 101 saves the new service data belonging to the user 104 through the HDFS component 1011 and the Hive component 1012 (for example, the main cluster 101 stores some of the saved service data modified), the business data backed up to the HDFS component 1021 lacks new data in the time period from 13:00:00 to 13:01:00.
- the primary cluster 101 fails, when the user 104 accesses the business data backed up in the standby cluster 102 through the client 103, the latest and correct business data can be obtained by accessing the Hive component 1022, and some services obtained by the HDFS component 1021 can be accessed.
- the data backup system 100 may need to access the detailed billing data in catalog 3 and the customer list in table 3, and based on the time dimension, compare the detailed billing data in catalog 3 and The customer list in Table 3 is compared and verified to determine the corresponding bill for each customer.
- the customer list and the bill detail data belong to different time periods, there may be some bills that have been settled by some customers that are considered unsettled, or there may be some bills in the bill detail data that do not belong to the customer list. Errors such as any customer, thereby reducing the reliability of the business data stored by the data backup system 100, that is, reducing the quality of the data storage service provided by the data backup system 100.
- the active cluster 101 and the standby cluster 102 may also be implemented by a single device, respectively.
- the main cluster 101 is used to provide data storage and processing for the application 1 and the application 2 deployed in the application server 105 , and to read and write data for the application server 105 .
- the application server 105 includes two applications as an example. In actual application, the application server 105 may include any number of applications.
- the backup cluster 102 serves as a backup cluster of the primary cluster 101 and is used for backing up data in the primary cluster 101. After a disaster tolerance switchover occurs, the backup cluster 102 supports the data read and write of the application server 105.
- FIG. 3 shows that the primary control device 103 and the backup cluster 102 are jointly deployed at the backup site; the backup control device 104 and the primary cluster 101 are jointly deployed at the primary site, wherein the primary site and the backup site can be two independent regions (regions) , computer room or equipment clusters under two different local area networks, usually the main site and the standby site also have independent air, fire, water, and electricity control systems.
- the deployment manner of the master control device 103 and the standby control device 104 shown in FIG. 3 is only an example.
- FIG. 3 takes the application server 105 deployed outside the data backup system 300 as an example for illustration. In other possible implementations, the application server 105 may also be deployed in the data backup system 300. At this time, when the main control device 103.
- the data backup system 300 may be deployed in a cloud environment, for example, the data backup system 300 may be constructed based on multiple regions of the cloud environment. Alternatively, the data backup system 300 may also be deployed in an edge environment and constructed through multiple computer rooms in the edge environment.
- the cloud environment in this application refers to a cloud service provider set up to provide services (such as data to tenants) in multiple regions (eg: East China region, North China region) (the above-mentioned users 106 are also tenants). A collection of resources for storage services, etc.).
- a cloud environment usually includes a large number of resources, and can provide basic resource services and/or software application services for tenants in various regions.
- component 1, component 2, and component 3 may be any components on the main cluster 101 for encapsulating and storing non-streaming data. Also, there may be differences in the components used to process or store data for different services. For example, when processing or storing data of business 1, the components used may include only component 1; when processing or storing data of business 2, the components used may include component 1 and component 2; when processing or storing data of business 3, the components used may include component 1 and component 2; The components used can include component 1, component 2, and component 3.
- the standby cluster 102 may also include one or more components. As shown in FIG. 5 , the standby cluster 102 includes a component 4 , a component 5 , a component 6 , and the like. Similar to the main cluster 101 . Component 4, Component 5, and Component 6 on the standby cluster 102 can be used to process or store service data of one or more users, and different data of the same service can be encapsulated into the same or different data formats. Or there may be differences in the components used to store data of different services. The components on the standby cluster 102 may serve as backups for the components on the primary cluster 101 .
- component 4 on the standby cluster 102 can be used as the backup of component 1 on the primary cluster 101
- component 5 can be used as the backup of component 2
- component 6 can be used as the backup of component 3.
- the backup cluster 102 periodically backs up the service data on the primary cluster 101 before the primary cluster 101 fails.
- the service data related to each component on the master cluster 101 may be backed up to the backup components on the backup cluster 102 .
- the cloud environment or edge environment where the master cluster 101 and the backup cluster 102 are located includes many computing devices (such as servers, etc.), so that multiple sets of master clusters and multiple sets of backup clusters can be constructed in the cloud environment or edge environment. Therefore, the master cluster 101 and the backup cluster 102 may be paired in advance in a cloud environment or an edge environment to construct the data backup system 300 shown in FIG. 5 .
- the cloud environment or edge environment may present the cluster pairing interface as shown in FIG. 6 to the administrator, and in the cluster pairing interface, not only multiple groups of cluster identifiers (the cluster 1 shown in FIG. 6) may be presented , cluster 2, cluster 3, etc.), and can also present the relevant information of each cluster, such as the location information and resource specification information of the cluster as shown in Figure 6, so that the administrator can select from the multiple clusters presented in the cluster pairing interface.
- the corresponding clusters are used as the primary cluster 101 and the backup cluster 102 to construct the data backup system 300 .
- the cloud environment or the edge environment may also construct the data backup system 300 according to the user's pairing operation on the master cluster 101 and the backup cluster 102 , which is not limited in this embodiment.
- data communication can be performed between the master cluster 101 and the backup cluster 102.
- the backup cluster 102 can periodically back up the service data on the master cluster 101 to the backup cluster 102.
- a communication authentication process may be completed in advance, so that the master cluster 101 and the backup cluster 102 trust each other.
- the main cluster 101 and the standby cluster 102 may perform communication authentication based on a trusted third-party authentication protocol in the design of the Transmission Control Protocol/Internet Protocol (Transmission Control Protocol/Internet Protocol, TCP/IP) network system.
- TCP/IP Transmission Control Protocol/Internet Protocol
- the third-party authentication protocol may be, for example, the Kerberos protocol or the like.
- the communication authentication process may be automatically performed between the master cluster 101 and the backup cluster 102, and the communication authentication between the two parties may also be completed under the intervention of a user or an administrator. In this way, the security and reliability of data communication between the master cluster 101 and the backup cluster 102 can be improved.
- the main control device 103 can run program software for configuring data backup policies and controlling the active and standby clusters.
- the program software can configure the first data backup for the first service of user 1 Policy
- the first data backup policy includes the information of multiple data sets related to the first service in the main cluster 101 and the first time.
- the multiple data sets related to the first service may specifically include data sets processed or stored by component 1 and data sets processed or stored by component 2 .
- the multiple data sets related to the first service may also be referred to as the first protection group.
- the information of the first protection group may specifically be the identifier of the file directory in the component, such as the name of the directory 3 in FIG.
- the service data backed up to the standby cluster 102 is a plurality of data sets related to the first service that have been stored in the primary cluster 101 at the first moment.
- the main control device 103 may configure the first data backup policy for the first service based on the operation of the user 1 .
- the main control device 103 may present the policy configuration interface shown in FIG. 7 to the user 1, and prompt information (“Please specify the backup service and backup time” as shown in FIG. 7) may be presented in the policy configuration interface,
- the user 1 is prompted to input the identifier of the first service to be configured (such as the name of the first service, etc.) and the time when the data of the first service is backed up.
- the main control device 103 may prompt whether to configure the first protection group for the first service according to the identifier of the first service input by the user 1 on the control policy configuration interface.
- one or more data related to the first service stored on the main cluster 101 are further presented.
- name of each data set such as the names of data set 1 related to component 1 and the names of data set 2 and data set 3 related to component 2 as shown in FIG. 7 , and prompts user 1 to configure the first protection group for the first service , that is, the user 1 is prompted to configure one or more data sets related to the first service that need to be backed up to the standby cluster 102 .
- the main control device 103 may determine that the protection group of the first service includes data set 1 , data set 2 and data set 3 based on user 1's selection operation or input operation on data set 1 to data set 3 .
- the main control device 103 can also obtain the first time when the user 1 enters the data backup of the first service in the policy configuration interface, so that the main control device 103 generates the first time based on the first time and the information of the above-mentioned first protection group.
- a first data backup policy corresponding to a service In practical application, the backup cluster 102 can periodically back up the data of the first service on the main cluster 101.
- the master control device 103 may control the master cluster 101 or the backup cluster 102 according to the first data backup policy, and store the data in the master cluster 101 related to the first service at the first moment. Multiple data sets are backed up to the standby cluster 102 .
- the master control device 103 may generate a third instruction based on the data backup policy, and send the third instruction to the master cluster 101 to instruct the master cluster 101 to obtain the data related to the first service at the first moment. Snapshots of multiple datasets.
- the third instruction may include information of the first protection group and a first time, and the first time is later than the time when the main control device 103 sends the third instruction to the main cluster 101 .
- the main cluster 101 parses out the information of the first protection group and the first time in the third instruction, and determines the components in the main cluster 101 according to the information of the first protection group. 1 and a plurality of data sets related to the first service processed or stored by component 2.
- the master cluster 101 can use the backup management device 1011 on it to create process 1 and process 2, and start the snapshot task 1 and the snapshot task 2 including the first moment.
- process 1 is responsible for executing snapshot task 1, which is specifically used to access component 1, and at the first moment to take a snapshot of the data set processed or stored by component 1 and related to the first service, to obtain the first snapshot;
- process 2 is responsible for Execute the snapshot task 2, which is specifically used to access the component 2, and perform a snapshot of the data set related to the first service processed or stored by the component 2 at the first moment to obtain a second snapshot.
- the backup management apparatus 1011 may also use at least one executor on the main cluster 101 to perform the above snapshot task, wherein each executor may be implemented by, for example, an execution thread.
- the standby cluster 102 may use one executor to execute multiple replication tasks in sequence; or, the standby cluster 102 may use multiple executors to execute multiple replication tasks in parallel to improve data backup efficiency.
- the backup management device 1011 on the primary cluster 101 and the backup management device 1021 on the backup cluster 102 may be applications running on the corresponding servers.
- the backup management apparatus 1011 and the backup management apparatus 1021 may be hardware for running application programs, for example, any one of a processor core, a processor, and a server that is configured independently. The specific implementation of the backup management apparatus in this embodiment Not limited.
- the backup cluster 102 actively backs up service data as an example for illustrative description. In practical applications, the primary cluster 101 may also actively back up the data set related to the first service to the backup cluster 102 .
- the master control device 103 may send a first instruction to the master cluster to instruct the master cluster 101 sends the data corresponding to the snapshots of the multiple data sets related to the first service at the first moment to the standby cluster 102 .
- the first instruction may include the above-mentioned indication information of the first snapshot and the second snapshot.
- the master cluster 101 may determine the first snapshot and the second snapshot on the master cluster 101 according to the first instruction.
- the master cluster 101 can transmit the data set related to the first service processed or stored by the component 1 to the standby cluster 102 through one or more executors according to the first snapshot; and through the one or more executors, According to the second snapshot, the data set related to the first service processed or stored by the component 2 is transmitted to the standby cluster 102 .
- the specific implementation process of the master cluster 101 performing data backup according to the snapshot is similar to the specific implementation process of the backup cluster 102 performing the data backup process according to the snapshot, which can be understood by referring to the above-mentioned description, and will not be repeated here.
- the standby cluster 102 may also perform snapshots of the backed up multiple data sets, and store the obtained snapshots.
- the snapshot obtained after the backup cluster 102 takes a snapshot of the backed up service data is generally consistent with the snapshot obtained by the master cluster 101 taking a snapshot of the data of the first service at the first moment. In this way, the backup cluster 102 can use the snapshot at a future moment to determine the data of the first service stored by the data backup system 300 at the first moment.
- the main cluster 101 and the main control device 103 may be constructed by different computing devices, so that the clock sources between the main cluster 101 and the main control device 103 may not be unified.
- the first moment indicated by the master control device 103 in the third instruction may not be the same as the moment when the master cluster 101 actually performs the snapshot operation. For example, assuming that the clock on the master control device 103 is 5 seconds faster than the clock on the master cluster 101, and the first time indicated by the master control device 103 in the third instruction is 13:00:00, then the master cluster 101 is at 13:00:00.
- the same clock source can be set between the master cluster 101 and the master control device 103 .
- the master control device 103 may synchronize the clock with the master cluster 101 through the Network Time Protocol (NTP), so that the master control device 103 and the master cluster 101 have the same clock source.
- NTP Network Time Protocol
- main cluster 101 and the main control device 103 may also implement clock synchronization in other manners, which are not limited in this embodiment. Further, the master cluster 101 and the master control device 103 may also perform clock synchronization with the standby cluster 102 , so as to realize clock unification in the data backup system 300 .
- the main cluster 101 can not only store the data of the first service, but also store the data of other services, and the data of different services are usually different, and different services may come from the same user or different users.
- the main cluster 101 storing the data of the first service and the second service at the same time for the data of the first service, it can be the service data in the directory 1 as shown in FIG. 8 and the service data in the table 1, and for the data of the first service
- the data of the second service may be the service data in the directory 2 as shown in FIG. 8 , the service data in the table 2 and the service data in the table 3 .
- the main control device 103 can configure a second data backup strategy for the second business in addition to the first data backup strategy for the first business.
- the second data backup strategy includes the information of multiple data sets related to the second service in the main cluster 101 and the second time.
- multiple data sets related to the second service may also be referred to as a second protection group, and the second protection group includes data related to the second service processed or stored by component 1 in the main cluster 101 set, the data set related to the second business processed or stored by component 2, and the data set related to the second business processed or stored by component 3 (assuming that the main cluster 101 uses component 1, component 2, and component 3 to business data for processing or storage).
- the master control device 103 can control the master cluster 101 or the backup cluster 102 to back up the second protection group (ie, multiple data sets related to the second service) in the master cluster at the second moment based on the second data backup policy to Standby cluster 102.
- the specific implementation process of configuring the second data backup strategy by the main control device 103 and backing up multiple data sets related to the second service on the main cluster 101 to the standby cluster 102 according to the second data backup strategy is the same as the above-mentioned main control device.
- the service data on the master cluster 101 can be backed up to the backup cluster 102 periodically.
- the business data backed up to the standby cluster 102 each time may be all business data belonging to the first business on the main cluster 101 .
- the business data backed up to the standby cluster 102 may be the data stored in the main cluster 101 related to the first business at the first moment.
- the business data backed up to the backup cluster 102 may be incremental data on the master cluster 101 from the first time to the third time.
- the third time is the time when the data of the first service is backed up for the second time.
- the backup management device 1011 may notify the master control device 103 of it, and the master control device 103 instructs the backup management device 1021 to perform the second round of business data backup. Similar to the business data backup process in the first round, the backup management apparatus 1021 may start new copy task 3 and copy task 4 for the third snapshot and the fourth snapshot. Then, the standby cluster 102 can use at least one executor to execute the replication task 3, specifically, it can be determined according to the first snapshot and the third snapshot that the component 1 on the main cluster 101 is processed or processed in the time period from the first time to the third time.
- each subsequent backup of business data only incremental data can be transmitted between the main cluster 101 and the standby cluster 102, and there is no need to transmit all the business data related to the first business on the main cluster 101 to the standby cluster 102. Therefore, the amount of service data transmission between the primary cluster 101 and the backup cluster 102 can be effectively reduced, which can improve the backup efficiency and reduce the resource consumption required for backup of the service data.
- the above is an introduction to the data backup process between the primary cluster 101 and the backup cluster 102 , and the following describes the disaster recovery switching process in the data backup system 300 when the primary cluster 101 fails.
- the standby cluster 102 can 101 Take over the currently running business, and continue to provide users with read and write services of business data using pre-backed up business data, so as to ensure the reliability of the data backup system 300 for storing the user's business data.
- the application server 105 may automatically adjust the client accessing the cluster from the primary client 1051 to the standby client 1052 after the primary cluster 101 fails.
- the distributed application coordination service 2 may feed back second status information to the standby client 1052, where the second status information is used to indicate the primary identity or the standby identity, or to indicate whether the cluster fails.
- the arbitration module 1053 can obtain the first state information from the master client 1051 and obtain the second state information from the standby client 1052, respectively.
- the arbitration module 1053 may determine that the client of the application server 105 accessing the cluster is switched to the standby Client 1052.
- the application server 105 can automatically switch the cluster accessing data from the main cluster 101 to the standby cluster 102, without manual intervention to switch the cluster accessed by the application server 105, thereby improving the data backup system. 300% flexibility, reducing manual operation and maintenance costs.
- the application server 105 and the main control device 103 are independently deployed, and the main client 1051, the standby client 1052 and the arbitration module 1053 are deployed in the application server 105 as an example for illustrative description.
- the application server 105 may be deployed in an integrated manner with the main control device 103, that is, the functions of the application server 105 and the main control device 103 are implemented by one device, which may be referred to as a control device or as a
- the application server that is, the control device or the application server can be integrated with the main client 1051, the standby client 1052 and the arbitration module 1053 as shown in FIG.
- the control device or the application server performs the access performed by the application server 105.
- the functions of the master client 1051 , the backup client 1052 and the arbitration module 1053 in the application server 105 can also be implemented by the control device 103 , or That is, the control device 103 performs the above-mentioned operation of automatic switching of the active and standby clients. Since the active and standby clients are in the control device 103, the read and write requests of the data generated by the application server 105 will be sent to the control device 103, and executed by the current active client (or standby client) in the control device 103. Read and write data in the primary cluster (or standby cluster).
- both the first state information and the second state information indicate that the respective clusters are the master cluster (for example, before the master cluster 101 after failure recovery is managed by the master control device 103, the distributed application on the master cluster 101 coordinates the service 1 indicates to the master client 1051 that this cluster is the master cluster), the application server 105 still uses the current access policy to access the cluster, that is, the currently accessed cluster may not be switched.
- the standby cluster 102 takes over the services run by the primary cluster 101 before the failure, not only the data related to these services, but also the data of the users, such as user names and user permissions, are required.
- the operation and maintenance personnel can configure corresponding user data on the standby cluster 102, so that the standby cluster 102 provides corresponding user data based on the user data.
- data read and write services.
- the configured user data may include, for example, at least one of data such as a user ID, a user authority, and an ID of a tenant to which the user belongs.
- the backup cluster 102 can not only back up the service data on the primary cluster 101 to the backup cluster 102, but also back up the user data on the primary cluster 101 to the backup cluster 102.
- Standby cluster 102 the master control device 103 may send a fourth instruction to the master cluster 101 to instruct the master cluster 101 to synchronize user data to the standby cluster 102 according to the fourth instruction.
- the master control device 103 may acquire the user data stored in the master cluster 101 and the backup cluster 102, and adjust the user data stored in the backup cluster 102 according to the user data stored in the master cluster 101, so that the master cluster 101 and the backup cluster 102 can be adjusted.
- the main cluster 101 is faulty as an example for illustrative description.
- the application server 105 can automatically switch the accessed cluster from the main cluster 101 to the standby cluster 102, and the specific implementation process of implementing the cluster switching is similar to the above-mentioned implementation process, which can be understood with reference to the above-mentioned descriptions. Do repeat.
- the main control device 103 may control and implement the backup of the service data on the main cluster 101 to the standby cluster 102 .
- the backup cluster 102 can take over the services on the primary cluster 101.
- the backup control device 104 can control and implement the backup of the service data on the backup cluster 102. to the primary cluster 101 (recovered from a failure).
- the primary control device 103 can synchronize its own related configuration information to the backup control device 104 in advance, so that when the primary cluster 101 fails, the backup control device 104 can control and implement the corresponding business data backup process, thus eliminating the need for operation and maintenance Personnel are repeatedly configured manually.
- the configuration information in the master control device 103 can be configured in the master control device 103 by an administrator during the device deployment process, so that the master control device 103 can control the communication between the master cluster 101 and the backup cluster 102 according to the configured information data backup.
- the configuration information synchronized by the master control device 103 may include relevant information of the data backup system 300, such as the pairing relationship between the master cluster 101 and the backup cluster 102, the resources included in the data backup system 300, and the data backup system 300. Information such as the time point of the currently backed up business data.
- the user 106 can set a protection group corresponding to the first service according to the data set generated when the data of the first service is stored on the main cluster 101, and the information of the protection group indicates the relationship between the main cluster 101 and the first service. Multiple data sets related to the service, so that when the data of the first service is backed up subsequently, the multiple data sets indicated by the protection group are backed up to the standby cluster 102 .
- the specific implementation process of creating a protection group for the first service by the user 106 reference may be made to the description of the relevant parts of the foregoing embodiments, which will not be repeated here.
- the main control device 103 sends a third instruction to the backup management apparatus 1011 on the main cluster 101 before time T 0 according to the configured data backup policy, where the third instruction includes time T 0 and the information of the protection group.
- the master control device 103 generates and sends the third instruction to the master cluster 101 , so that the master cluster 101 can perform snapshot processing on the service data at the upcoming time T 0 .
- structured data stored by Hive components is stored in the corresponding HDFS directory in a file format.
- Process 3 can be responsible for accessing the SparkSQL component, and can obtain the metadata of the business data stored by the SparkSQL component from the database through a data extraction command at time T0 , so as to determine the actual storage location of the data indicating the first business according to the metadata. Snapshot the HDFS directory.
- the structured data stored by the SparkSQL component is also saved in the corresponding HDFS directory through the file format.
- the master control device 103 issues a second instruction to the backup management device 1021 on the backup cluster 102 to instruct the backup cluster 102 to copy the multiple data sets related to the first service in the master cluster 101 in The data corresponding to the snapshot at the first moment.
- the backup management device 1011 on the main cluster 101 After the backup management device 1011 on the main cluster 101 completes the snapshot of the business data stored by the HDFS component, the Hive component and the SparkSQL component by using multiple processes, it can return a notification of the success of the snapshot to the main control device 103, so that the main After determining that the snapshot ends, the control device 103 instructs the standby cluster 102 to back up the service data on the primary cluster 101 to the standby cluster 102 by issuing a second instruction.
- the backup management apparatus 1021 starts a plurality of copy tasks according to the information of the protection group of the first service, and each copy task is used to implement the backup of a data set related to the first service stored in one component.
- the backup management apparatus 1021 executes the multiple replication tasks through at least one executor, and backs up the data set related to the first service stored by each component to the standby cluster 102 according to the snapshot at time T 0 on the primary cluster 101 .
- the executor 1 is used to perform the replication task 1, and obtains a snapshot of the data set (such as the HDFS directory) related to the first service stored for the HDFS component at the time T 0 by accessing the main cluster 101, so that the HDFS can be stored according to the snapshot.
- the data set related to the first service stored by the component is backed up to the storage area corresponding to the HDFS component on the standby cluster 102 .
- the metadata of the business data on the standby cluster 102 can be saved to the database on the standby cluster 102, so as to facilitate subsequent The data of the first service is queried on the standby cluster 102 according to the metadata in the database.
- the backup management apparatus 1021 takes a snapshot of the data of the first service backed up to the standby cluster 102 through at least one executor.
- the backup management apparatus 1021 may also use the executor to take a snapshot of the backed up data of the first service.
- the snapshot on the standby cluster 102 is consistent with the service data of the primary cluster 101 at time T 0 .
- the backup cluster 102 actively backs up business data from the main cluster 101 as an example for illustrative description. In practical applications, the main cluster 101 may also actively back up the business data to the backup cluster.
- the master control device 103 may send a first instruction to the master cluster 101 to instruct the master cluster 101 to back up the data set related to the first service stored by the HDFS component, the Hive component and the SparkSQL component to the standby cluster 102.
- the master cluster 101 may back up the data of the first service to the backup cluster 102 through the corresponding executor according to the snapshot at time T 0 .
- the master control device 103 may send a fourth instruction to the master cluster 101, so as to instruct the master cluster 101 to synchronize user data to the standby cluster 102 based on the fourth instruction.
- the master control device 103 may acquire the user data stored in the master cluster 101 and the backup cluster 102, and adjust the user data stored in the backup cluster 102 according to the user data stored in the master cluster 101, so that the master cluster 101 and the backup cluster 102 can be adjusted.
- User data stored in the cluster 102 is consistent.
- this embodiment may further include the following step S910:
- S910 The standby cluster 102 backs up the user data on the active cluster 101 to the standby cluster 102; or, the active cluster 101 actively backs up the user data to the standby cluster 102; The user data stored in the standby cluster 102 is adjusted.
- the user data on the master cluster 101 may include, for example, at least one of the identifiers of the users (including the user 106 ) created on the master cluster 101 , the identifiers of the tenants, and the permissions applied for the users.
- the backup cluster 102 can take over the services currently running on the primary cluster 101, and use the pre-backed up service data to continue serving users
- the read and write service of the business data is provided, so as to ensure the reliability of the data backup system 300 for storing the business data of the user.
- the user 106 can access the main cluster 101 or the standby cluster 102 through the application server 105. Specifically, the user can access the main cluster 101 through the main client 1051 on the application server 105 before the main cluster 101 fails. After the main cluster 101 fails, the application server 105 can automatically switch the clients accessing the cluster, so that the user can access the standby cluster 102 through the standby client 1052 on the application server 105 .
- the application server 105 can automatically switch the clients accessing the cluster, so that the user can access the standby cluster 102 through the standby client 1052 on the application server 105 .
- the backup of the service data on the master cluster 101 at time T 0 is taken as an example for illustration.
- data backup may be performed between the master cluster 101 and the backup cluster 102 periodically.
- the user 106 configures the starting backup time as time T0 on the policy configuration interface, he also configures the backup cycle between the primary cluster 101 and the backup cluster 102, so that after the first data backup is performed, the backup The second data backup process is performed when the period is long. Therefore, the time T 0 in the above embodiment is also the starting time of the periodic backup.
- the backup cluster 102 can back up all the service data on the master cluster 101 to the backup cluster 102 each time according to the similar process described in the above embodiment.
- the backup cluster 102 may only back up the incremental data on the primary cluster 101 to the backup cluster 102 during the second and subsequent backup processes.
- the second round of data backup between the main cluster 101 and the standby cluster 102 is used as an example for description, wherein the service data backed up in the second round is added to the main cluster by the main cluster 101 during the time period T 0 to T 1 .
- Business data on 101 (hereinafter referred to as incremental data).
- FIG. 10 a schematic flowchart of another data backup method in an embodiment of the present application is shown, and the method may specifically include:
- the main control device 103 Before time T1, the main control device 103 sends a fifth instruction to the backup management apparatus 1011 on the main cluster 101, where the fifth instruction includes time T1 and the protection group information of the first service.
- the backup management apparatus 1011 After receiving the fifth instruction, the backup management apparatus 1011 creates multiple processes (or uses the multiple processes already created during the first round of data backup), and uses the multiple processes to access the data indicated by the information of the protection group respectively Set the corresponding HDFS components, Hive components and SparkSQL components.
- the backup management apparatus 1011 uses a plurality of processes to take a snapshot of the data set related to the first service stored by these components at time T1.
- the process 1 created by the backup management apparatus 1011 may be responsible for accessing the HDFS component, and at time T1, take a snapshot of the data set stored by the HDFS component under the HDFS directory related to the first service.
- the data of the first service stored by the HDFS component is stored in the HDFS directory in a file format.
- Process 2 can be responsible for accessing the Hive component, and can obtain the metadata of the first service-related data set stored by the Hive component from the database through a data extraction command at time T1, so as to indicate that the data set is actually stored according to the metadata.
- the location of the HDFS directory for snapshots. Among them, the structured data stored by the Hive component is saved in the corresponding HDFS directory in a file format.
- Process 3 can be responsible for accessing the SparkSQL component, and can obtain the metadata of the data set related to the first business stored by the SparkSQL component from the database through a data extraction command at time T1, so as to indicate the data set of the data set according to the metadata. Snapshot the HDFS directory of the actual storage location. Among them, the structured data stored by the SparkSQL component is also saved in the corresponding HDFS directory in a file format.
- the master control device 103 issues a sixth instruction to the backup management device 1021 on the backup cluster 102 to instruct the backup cluster 102 to copy the data set of the first service in the master cluster 101 at time T1 Backup to the standby cluster 102 .
- the backup management device 1021 starts a plurality of copy tasks according to the information of the protection group of the first service, and each copy task is used to implement the backup of a data set related to the first service stored in one component.
- the backup management device 1021 executes the multiple replication tasks through at least one executor, and backs up the incremental data of the first service stored by each component to the backup according to the snapshot at time T0 and the snapshot at time T1 on the main cluster 101 Cluster 102 .
- the executor 1 is used to execute the replication task 1, and obtains a snapshot corresponding to the first service of the HDFS component at time T0 and a snapshot corresponding to the first service of the HDFS component at time T1 by accessing the main cluster 101, so that it can be According to the snapshots at time T 0 and time T 1 , determine the incremental data of the first service stored by the HDFS component in the time period T 0 to T 1 , and back up the incremental data to the corresponding HDFS component on the standby cluster 102 in the storage area.
- the executor 2 is used to execute the replication task 2, and according to the snapshots corresponding to the first service of the Hive component at time T 0 and time T 1 , determine the first service stored by the Hive component in the time period T 0 to T 1 and back up the incremental data to the storage area corresponding to the Hive component on the standby cluster 102;
- the executor 3 is used to execute the replication task 3, according to the first time for the SparkSQL component at time T 0 and time T 1
- the incremental data of the first service stored by the SparkSQL component in the time period T 0 to T 1 is determined, and the incremental data is backed up to a storage area corresponding to the SparkSQL component on the standby cluster 102 .
- the metadata of the incremental data on the standby cluster 102 can be saved to the database on the standby cluster 102, so as to facilitate subsequent backup in the standby cluster 102.
- the data of the corresponding first service is queried on the cluster 102 according to the metadata in the database.
- the service data backed up to the standby cluster 102 is the data of the first service of the master cluster 101 at time T 0 and the service data newly added by the first service in the time period T 0 to T 1 , which is Service data on the main cluster 101 at time T1.
- the backup management apparatus 1021 takes a snapshot of the incremental data of the first service backed up to the standby cluster 102 through at least one executor.
- FIG. 11 a schematic structural diagram of a control device provided by an embodiment of the present application is shown.
- the control device 1100 shown in FIG. 11 can be used to implement the data backup method executed by the main control device 103 in the above embodiments.
- the control device 1100 shown in FIG. 11 is located in the data backup system, as shown in the above-mentioned FIG. 5 .
- the data backup system 300, etc., the data backup system further includes a master cluster and a backup cluster, and the control device 1100 includes:
- a control module 1101 configured to control the primary cluster or the secondary cluster to back up multiple data sets related to the first service in the primary cluster at the first moment to the secondary cluster according to a first data backup policy, wherein , the first data backup strategy includes the information of the multiple data sets related to the first service and the first moment.
- control module 1101 is specifically used for:
- Send a first instruction to the active cluster instructing the active cluster to send the data corresponding to the snapshots of the multiple data sets related to the first service at the first moment to the standby cluster, or, Send a second instruction to the standby cluster, instructing the standby cluster to copy from the active cluster the data sets in the active cluster corresponding to the snapshots at the first moment of the multiple data sets related to the first service data.
- control device 1100 further includes:
- a communication module 1102 configured to send a third instruction to the master cluster before the control device sends the first instruction to the master cluster, or before the control device sends the second instruction to the standby cluster , the third instruction includes the information of the multiple data sets related to the first service and the first moment, and the third instruction is used to instruct the main cluster to obtain the data at the first moment A snapshot of the plurality of data sets related to the first service.
- control device 1100 further includes:
- a communication module 1102 configured to send a fourth instruction to the primary cluster, where the fourth instruction instructs the primary cluster to synchronize user data to the standby cluster;
- control module 1101 is further configured to acquire user data stored in the primary cluster and the standby cluster, and adjust the user data stored in the standby cluster according to the user data stored in the primary cluster .
- control device 1100 further includes a configuration module 1103, configured to, according to the information of the multiple data sets related to the first service input by the user and the first moment, configure the The first service configures the first data backup policy.
- control device 1100 further includes a configuration module 1103, configured to configure a second data backup policy for the second service, where the second data backup policy includes multiple data backup policies related to the second service. information of the data set and the second moment;
- the control module 1101 is further configured to control, according to the second data backup policy, the active cluster or the standby cluster to store the multiple data related to the second service in the active cluster at the second moment. data sets are backed up to the standby cluster.
- the plurality of data sets related to the first service include data sets processed or stored by a first component in the main cluster and data sets processed or stored by a second component in the main cluster A dataset processed or stored by a component.
- control device 1100 includes a primary client and a secondary client, the primary client is used to detect the first state information of the primary cluster, and the secondary client is used to detect all To describe the second status information of the standby cluster, the control device 1100 further includes:
- a communication module 1102 configured to acquire the first state information detected by the primary client and the second state information detected by the standby client;
- Determining module 1104 configured to determine that the standby client is an application when the first state information indicates that the primary cluster is the standby identity or the cluster fails, and the second state information indicates that the standby cluster is the primary identity accessing client.
- control device 1100 further includes a prompting module 1105 and an adjusting module 1106;
- the prompting module 1105 is configured to prompt the user for information about the failure of the main cluster
- the adjustment module 1106 is configured to adjust the identity of the standby cluster from the standby identity to the primary identity in response to the user's identity adjustment operation for the standby cluster.
- control device 1100 is deployed in isolation from the main cluster.
- the same clock source is set in the control device 1100, the master cluster, and the backup cluster.
- the primary cluster and/or the standby cluster includes a cluster constructed based on a hadoop architecture.
- the control device 1100 may correspond to executing the methods described in the embodiments of the present application, and the above-mentioned and other operations and/or functions of the various modules of the control device 1100 are respectively in order to realize the functions of the main control device 103 in the above-mentioned embodiments. For the sake of brevity, the corresponding process of execution is not repeated here.
- FIG. 12 a schematic structural diagram of a master cluster provided by an embodiment of the present application is shown.
- the main cluster 1200 shown in FIG. 12 can be used to implement the data backup method performed by the main cluster 101 in the above embodiments.
- the main cluster 1200 shown in FIG. 12 is located in the data backup system, and the data shown in FIG.
- the backup system 300, etc., the data backup system also includes a backup cluster and a control device, and the main cluster 1200 includes:
- a communication module 1201, configured to acquire an instruction issued by the control device, wherein the instruction includes the information of multiple data sets related to the first service and the first moment;
- the backup module 1202 is configured to, according to the instruction, back up multiple data sets related to the first service in the primary cluster at the first moment to the backup cluster.
- the backup module 1202 is specifically configured to:
- the data corresponding to the snapshot is sent to the standby cluster.
- the backup module 1202 is further configured to synchronize user data to the standby cluster.
- the active cluster and/or the standby cluster include clusters constructed based on the hadoop architecture.
- the main cluster 1200 may correspond to executing the methods described in the embodiments of the present application, and the above-mentioned and other operations and/or functions of each module of the main cluster 1200 are respectively performed to implement the main cluster 101 in the above-mentioned embodiments.
- the corresponding process for the sake of brevity, will not be repeated here.
- Figure 13 provides a control device. As shown in FIG. 13 , the control device 1300 may specifically be used to implement the functions of the control device 1100 shown in FIG. 11 .
- the control device 1300 includes a bus 1301 , a processor 1302 and a memory 1303 . Communication between the processor 1302 and the memory 1303 is through the bus 1301 .
- the bus 1301 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus or the like.
- PCI peripheral component interconnect
- EISA extended industry standard architecture
- the bus can be divided into address bus, data bus, control bus and so on. For ease of presentation, only one thick line is used in FIG. 13, but it does not mean that there is only one bus or one type of bus.
- the processor 1302 may be a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (MP) or a digital signal processor (DSP), a neural network Any one or more of processors such as a network processor (neural network processing unit, NPU).
- CPU central processing unit
- GPU graphics processing unit
- MP microprocessor
- DSP digital signal processor
- NPU neural network processing unit
- the memory 1303 may include volatile memory, such as random access memory (RAM).
- RAM random access memory
- the memory 1303 may also include non-volatile memory (non-volatile memory), such as read-only memory (ROM), flash memory, hard drive drive (HDD) or solid state drive (solid state drive) , SSD).
- ROM read-only memory
- HDD hard drive drive
- SSD solid state drive
- the memory 1303 stores executable program codes, and the processor 1302 executes the executable program codes to execute the data backup method executed by the main control device 103 in the foregoing embodiments.
- Figure 14 provides a master cluster. As shown in FIG. 14 , the master cluster 1400 may be specifically used to implement the functions of the master cluster 1200 shown in FIG. 12 .
- the main cluster 1400 includes at least one processor and at least one memory, where the at least one processor and at least one memory may be located in one or more computing devices. Exemplarily, in this embodiment, at least one memory and at least one memory located in multiple computing devices are used as an example for description. Among them, each computing device may include a bus 1401 , a processor 1402 and a memory 1403 . The processor 1402 and the memory 1403 communicate through the bus 1401 .
- the bus 1401 may be a PCI bus, an EISA bus, or the like.
- the bus can be divided into address bus, data bus, control bus and so on. For ease of presentation, only one thick line is shown in FIG. 14, but it does not mean that there is only one bus or one type of bus.
- the processor 1402 may be any one or more of processors such as CPU, GPU, MP, DSP, and NPU.
- Memory 1403 may include volatile memory, such as RAM.
- the memory 1403 may also include non-volatile memory such as ROM, flash memory, HDD or SSD.
- the memory 1403 in each computing device may store executable program code, and after the processor 1402 in each computing device executes the executable program code, the main cluster 1400 executes the execution of the main cluster 101 in the foregoing embodiment. data backup method.
- Embodiments of the present application also provide a computer-readable storage medium.
- the computer-readable storage medium may be any available medium that a computing device can store, or a data storage device such as a data center that contains one or more available media.
- the usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVDs), or semiconductor media (eg, solid state drives), and the like.
- the computer-readable storage medium includes instructions, the instructions instructing the computing device to execute the data backup method executed by the main control device 103 or the main cluster 101 described above.
- the embodiments of the present application also provide a computer program product.
- the computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on the computing device, all or part of the processes or functions described in the embodiments of the present application are generated.
- the computer instructions may be stored in or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted over a wire from a website site, computer or data center. (eg coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg infrared, wireless, microwave, etc.) to another website site, computer or data center.
- a website site e.g coaxial cable, fiber optic, digital subscriber line (DSL)
- wireless eg infrared, wireless, microwave, etc.
- the computer program product can be a software installation package, which can be downloaded and executed on a computing device when any of the aforementioned object recognition methods needs to be used.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Hardware Redundancy (AREA)
Abstract
本申请提供了一种数据备份方法,该方法应用于包括主集群、备集群以及控制设备的数据备份系统,控制设备根据第一数据备份策略控制主集群或者备集群将第一时刻的主集群中与第一业务相关的多个数据集备份至备集群,其中,第一数据备份策略包括与第一业务相关的多个数据集的信息和第一时刻。如此,可以实现在主集群以及备集群之间,以业务为粒度进行数据备份,从而备份至备集群上的与第一业务相关的多个数据集能够在时间维度上保持一致,以便备集群可以在主集群发生故障时能够基于同一时间段的业务数据恢复业务运行,进而可以提高数据备份系统提供业务服务的质量。此外,本申请还提供了对应的数据备份系统以及相关设备。
Description
本申请涉及大数据技术领域,尤其涉及一种数据备份方法、系统及相关设备。
随着大数据技术的发展,越来越多的用户(如企业等)逐步将业务数据迁移至大数据平台进行存储,相应的,用户对于大数据平台的容灾能力也越来越看重,即期望大数据平台在发生设备故障等灾难时能够保证用户存储于大数据平台的业务数据不会发生丢失。
目前,大数据平台通过数据备份系统实现对用户业务数据容灾,该数据备份系统包括主集群以及备集群。主集群可以利用组件处理用户的业务数据,如利用组件对该用户的业务数据进行封装和存储等,不同的组件可以处理同一用户的不同类型的业务数据。比如,主集群可以利用组件1存储用户的音视频、图片等业务数据,利用组件2存储该用户的表格形式的业务数据等。通常情况下,主集群可以周期性的将各个组件处理的业务数据备份至备站点,以便在主集群发生故障后,备集群能够基于备份的业务数据继续为用户提供业务服务。因此,在容灾切换后,如何尽可能避免备份至备集群的业务数据影响提供业务服务的质量,成为目前急需解决的问题。
发明内容
有鉴于此,本申请实施例提供了一种数据备份方法,该方法在数据备份时,以业务为粒度进行数据备份,以避免数据备份不一致导致的业务服务出错,保证了备份至备集群的业务数据不会影响提供业务服务的质量。本申请还提供了对应的数据备份系统、控制设备、计算设备、计算机可读存储介质以及计算机程序产品。
第一方面,本申请实施例提供一种数据备份方法,该方法应用于包括主集群、备集群以及控制设备的数据备份系统,具体实施时,控制设备根据第一数据备份策略控制主集群或者备集群将第一时刻的主集群中与第一业务相关的多个数据集备份至备集群,其中,第一数据备份策略包括与第一业务相关的多个数据集的信息和第一时刻。
如此,可以实现在主集群以及备集群之间以业务为粒度进行数据备份,从而备份至备集群上的与第一业务相关的多个数据集能够在时间维度上保持一致。这样,当主集群发生故障时,备集群可以基于同一时间段的业务数据恢复该业务运行,从而可以避免数据备份系统因为备份至备集群上的业务数据在时间维度上不一致而发生提供业务服务出错的问题,进而可以提高数据备份系统为用户存储业务数据的可信度,提高业务服务质量。
在一种可能的实施方式中,控制设备根据第一数据备份策略控制主集群或者备集群将第一时刻的主集群中与第一业务相关的多个数据集备份至备集群时,具体可以是控制设备向主集群发送第一指令,以指示主集群将与该第一业务相关的多个数据集在第一时刻的快照对应的数据发送至备集群。或者,控制设备向备集群发送第二指令,以指示备集群从主集群复制主集群中与第一业务相关的多个数据集在第一时刻的快照对应的数据。如此,控制设备可以通过向主集群或者向备集群发送指令的方式,控制主集群或者备集群根据快照实现数据备份过程。
在一种可能的实施方式中,控制设备向主集群发送第一指令或者向备集群发送第二指令之前,可以先向主集群发送第三指令,第三指令包括与第一业务相关的多个数据即的信息以 及第一时刻,用于指示主集群获取在第一时刻的与第一业务相关的多个数据集的快照。这样,后续主集群或者备集群可以根据该第一时刻对应的快照,实现将主集群中与第一业务相关的多个数据集备份至备集群,并且备份至备集群上的多个数据集均为第一时刻的主集群中与第一业务相关的多个数据集。通过采用对第一时刻的与第一业务相关的多个数据集进行快照的方式,可以更准确地获取并备份该第一时刻的数据集,避免了由于通信时延导致的数据备份时间不一致的问题。
在一种可能的实施方式中,控制设备还可以向主集群发送第四指令,该第四指令用于指示主集群将用户数据同步至备集群;或者,控制设备可以获取主集群以及备集群中存储的用户数据,并根据主集群中存储的用户数据对备集群中存储的用户数据进行调整,以使得主集群与备集群中存储的用户数据一致。这样,当备集群接管主集群上的业务时,可以根据备份至备集群的用户数据为用户提供相应的业务服务,从而无需由运维人员在备集群上进行人工配置用户数据。如此,不仅可以降低于运维人员的运维成本,而且,也能有效减小数据备份系统的恢复时间目标。
示例性地,上述用户数据例如可以是用户标识、用户权限、租户标识中的至少一种,或者也可以是其它与用户相关的数据。
在一种可能的实施方式中,控制设备不仅可以为第一业务配置第一数据备份策略,还可以为第二业务配置第二数据备份策略,该第二数据备份策略包括与第二业务相关的多个数据集的信息和第二时刻;然后,控制设备根据第二数据备份策略控制主集群或备集群将第二时刻的主集群中与第二业务相关的多个数据集备份至所述备集群。其中,第二业务与第一业务属于不同业务,具体可以是属于同一用户的不同业务,或者可以是属于不同用户的不同业务等。如此,数据备份系统可以实现为多个不同的业务进行按照业务粒度的数据备份,以此实现支持多个业务的高质量服务。
在一种可能的实施方式中,与第一业务相关的多个数据集包括由主集群中的第一组件处理或存储的数据集和由主集群中的第二组件存储或者存储的数据集。示例性地,第一组件与第二组件可以用于对将封装成不同的格式,或者第一组件以及第二组件处理数据的性能存在差异等。如此,可以实现将第一时刻的主集群中由不同组件处理或者存储的属于同一业务的不同数据集备份至备集群。
另外,数据备份系统中备份至备集群的与第二业务相关的多个数据集,可以包括由主集群中的第一组件处理或存储的数据集、由主集群中的第二组件存储或者存储的数据集以及由主集群中的第三组件存储或处理的数据集等。处理或者存储不同业务的数据集的组件,可以存在差异。
在一种可能的实施方式中,控制设备包括主客户端以及备客户端,其中,主客户端用于检测主集群的第一状态信息,而备客户端用于检测备集群的第二状态信息,则控制设备还可以获取主客户端检测得到的第一状态信息以及备客户端检测得到的第二状态信息,并且,当第一状态信息指示主集群为备身份或者集群失效(如主集群因故障而失效等),且第二状态信息指示备集群为主身份时,控制设备确定备客户端为应用访问的客户端。如此,控制设备可以在主集群以及备集群的主备身份发生反转时,自动对访问集群的客户端进行切换,从而无需运维人员进行人工切换。
示例性地,在主集群故障之前,控制设备获取的第一状态信息可以指示主集群为主身份,而控制设备获取的第二状态信息可以指示备集群为备身份。
在一种可能的实施方式中,控制设备还可以向用户提示主集群故障的信息,以便用户基于该提示确定主集群发生故障,从而控制设备可以响应于用户针对该备集群的身份调整操作,将备集群的身份由备身份调整为主身份。如此,通过人工操作来调整主集群以及备集群进行身份反转,可以尽可能避免数据备份系统因为程序运行错误而出现主集群以及备集群的主备身份发生异常切换。
在一种可能的实施方式中,控制设备与主集群隔离部署,例如,控制设备可以与备集群共同部署于备站点,而主集群部署于主站点等。由于隔离部署了,当主集群故障时,控制设备不会发生故障,可以用于故障的切换。
在一种可能的实施方式中,控制设备、主集群以及备集群中设置有相同的时钟源,如此,控制设备控制主集群或备集群进行数据备份的时刻,与主集群或者备集群实际执行备份的时刻保持一致,以此避免因为时钟源不统一而导致数据备份出错的问题。提高了数据备份的时间一致性。
在一种可能的实施方式中,主集群和/或备集群包括基于hadoop架构构建的集群。
第二方面,本申请提供一种数据备份方法,该方法应用于数据备份系统,该数据备份系统包括主集群、备集群以及控制设备。具体实施时,主集群获取控制设备下发的指令,其中,该指令汇总包括与第一业务相关的多个数据集的信息和第一时刻,从而主集群根据该指令,将第一时刻的主集群中与第一业务相关的多个数据集备份至备集群。如此,可以实现在主集群以及备集群之间,以业务为粒度进行数据备份,从而备份至备集群上的与第一业务相关的多个数据集能够在时间维度上保持一致。
在一种可能的实施方式中,主集群根据指令,将第一时刻的主集群中与第一业务相关的多个数据集备份至备集群时,具体可以是根据与第一业务相关的多个数据集的信息和第一时刻,获取主集群中与第一业务相关的多个数据集在第一时刻的快照,从而主集群根据该快照,将该快照对应的数据(也即与第一业务相关的多个数据集)发送至备集群,以此实现将第一时刻的主集群中与第一业务相关的多个数据集备份至备集群。
在一种可能的实施方式中,主集群还可以将用户数据备份至备集群。这样,当备集群接管主集群上的业务时,可以根据备份至备集群的用户数据为用户提供相应的业务服务,从而无需由运维人员在备集群上进行人工配置用户数据。如此,不仅可以降低于运维人员的运维成本,而且,也能有效减小数据备份系统的恢复时间目标。
示例性地,上述用户数据例如可以是用户标识、用户权限、租户标识中的至少一种,或者也可以是其它与用户相关的数据。
在一种可能的实施方式中,主集群和/或备集群包括基于hadoop架构构建的集群。
第三方面,本申请提供了一种控制设备,该控制设备位于数据备份系统,数据备份系统还包括主集群以及备集群,控制设备包括:控制模块,用于根据第一数据备份策略控制主集群或备集群将第一时刻的主集群中与第一业务相关的多个数据集备份至备集群,其中,第一数据备份策略包括与第一业务相关的多个数据集的信息和第一时刻。
在一种可能的实施方式中,控制模块,具体用于:向主集群发送第一指令,指示主集群将与第一业务相关的多个数据集在第一时刻的快照对应的数据发送至备集群,或者,向备集群发送第二指令,指示备集群从主集群复制主集群中与第一业务相关的多个数据集在第一时刻的快照对应的数据。
在一种可能的实施方式中,控制设备还包括:通信模块,用于向在控制设备向主集群发送第一指令,或者,控制设备向备集群发送第二指令之前,向主集群发送第三指令,第三指 令包括与第一业务相关的多个数据集的信息和第一时刻,第三指令用于指示主集群获取在第一时刻的与第一业务相关的多个数据集的快照。
在一种可能的实施方式中,控制设备还包括:通信模块,用于向主集群发送第四指令,第四指令指示主集群将用户数据同步至备集群;或者,控制模块,还用于获取主集群和备集群中存储的用户数据,并根据主集群中存储的用户数据对备集群中存储的用户数据进行调整。
在一种可能的实施方式中,控制设备还包括配置模块,用于根据用户输入的与第一业务相关的多个数据集的信息和第一时刻,为第一业务配置第一数据备份策略。
在一种可能的实施方式中,控制设备还包括配置模块,用于为第二业务配置第二数据备份策略,第二数据备份策略包括与第二业务相关的多个数据集的信息和第二时刻;控制模块,还用于根据第二数据备份策略控制主集群或备集群将第二时刻的主集群中与第二业务相关的多个数据集备份至备集群。
在一种可能的实施方式中,与第一业务相关的多个数据集包括由主集群中的第一组件处理或存储的数据集和由主集群中的第二组件处理或存储的数据集。
在一种可能的实施方式中,控制设备包括主客户端以及备客户端,主客户端用于检测主集群的第一状态信息,备客户端用于检测备集群的第二状态信息,控制设备还包括:通信模块,用于获取主客户端检测得到的第一状态信息以及备客户端检测得到的第二状态信息;确定模块,用于当第一状态信息指示主集群为备身份或集群失效,且第二状态信息指示备集群为主身份时,确定备客户端为应用访问的客户端。
在一种可能的实施方式中,控制设备还包括提示模块以及调整模块;提示模块,用于向用户提示主集群故障的信息;调整模块,用于响应于用户针对备集群的身份调整操作,将备集群的身份由备身份调整成主身份。
在一种可能的实施方式中,控制设备与主集群隔离部署。
在一种可能的实施方式中,控制设备、主集群和备集群中设置有相同的时钟源。
在一种可能的实施方式中,主集群和/或备集群包括基于hadoop架构构建的集群。
第四方面,本申请提供一种主集群,主集群位于数据备份系统,该数据备份系统还包括备集群以及控制设备,该主集群包括:通信模块,用于获取控制设备下发的指令,其中,指令中包括与第一业务相关的多个数据集的信息和第一时刻;备份模块,用于根据指令,将第一时刻的主集群中与第一业务相关的多个数据集备份至备集群。
在一种可能的实施方式中,备份模块,具体用于根据与第一业务相关的多个数据集的信息和第一时刻,获取主集群中与第一业务相关的多个数据集在第一时刻的快照;根据快照,将快照对应的数据发送至备集群。
在一种可能的实施方式中,备份模块,还用于将用户数据同步至备集群。
在一种可能的实施方式中,主集群和/或备集群包括基于hadoop架构构建的集群。
第五方面,本申请提供一种数据备份系统,该数据备份系统包括控制设备、主集群以及备集群。其中,控制设备用于执行如第一方面或第一方面的任一种实现方式中的数据备份方法,主集群用于执行如第二方面或第二方面的任一种实现方式中的数据备份方法,备集群用于获取并存储从主集群备份的数据集。
第六方面,本申请提供一种控制设备,其中,控制设备包括处理器、存储器。所述处理器用于执行存储器中存储的指令,以使得所述控制设备执行如第一方面或第一方面的任一种实现方式中的数据备份方法。
第七方面,本申请提供一种主集群,其中,主集群包括至少一个处理器以及至少一个存 储器,所述至少一个处理器执行所述至少一个存储器中存储的指令,以使得所述主集群执行如第二方面或第二方面的任一种实现方式中的数据备份方法。
第八方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算设备上运行时,使得计算设备执行上述第一方面或第一方面的任一种实现方式所述的数据备份方法。
第九方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在至少一个计算设备上运行时,使得所述至少一个计算设备执行上述第二方面或第二方面的任一种实现方式所述的数据备份方法。
第十方面,本申请提供了一种包含指令的计算机程序产品,当其在计算设备上运行时,使得计算设备执行上述第一方面或第一方面的任一种实现方式所述的数据备份方法。
第十一方面,本申请提供了一种包含指令的计算机程序产品,当其在至少一个计算设备上运行时,使得所述至少一个计算设备执行上述第二方面或第二方面的任一种实现方式所述的数据备份方法。
本申请在上述各方面提供的实现方式的基础上,还可以进行进一步组合以提供更多实现方式。
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请中记载的一些实施例,对于本领域普通技术人员来讲,还可以根据这些附图获得其他的附图。
图1为一种数据备份系统100的架构示意图;
图2为在数据备份系统100中备份业务数据的示意图;
图3为本申请实施例提供一种数据备份系统300的架构示意图;
图4为本申请实施例提供的在数据备份系统300中备份业务数据的示意图;
图5为本申请实施例提供的一种数据备份系统300的架构示意图;
图6为本申请实施例提供的一种集群配对界面的示意图;
图7为本申请实施例提供的一种策略配置界面的示意图;
图8为本申请实施例提供的在数据备份系统300中备份不同业务的数据集的示意图;
图9为本申请实施例提供的一种数据备份方法的流程示意图;
图10为本申请实施例提供的另一种数据备份方法的流程示意图;
图11为本申请实施例提供的一种控制设备的结构示意图;
图12为本申请实施例提供的一种主集群的结构示意图;
图13为本申请实施例提供的一种控制设备的硬件结构示意图;
图14为本申请实施例提供的一种主集群的硬件结构示意图。
下面将结合本申请中的附图,对本申请提供的实施例中的方案进行描述。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。
参见图1,为一示例性的数据备份系统的架构示意图。如图1所示,数据备份系统100中包括主集群101以及备集群102。其中,主集群101以及备集群102可以通过至少一个设备(如服 务器、虚拟机、容器、存储设备等)实现。例如,主集群101以及备集群102均可以通过包括多个服务器的集群(如基于hadoop架构构建的集群等)实现。在部分场景中,主集群101以及备集群102也可以分别通过单个设备实现等。
实际应用场景中,用户可以通过应用服务器(具体可以是应用服务器中的客户端)访问主集群101,该访问包括对该用户的一个或者多个业务相关的数据进行读写等。为了描述简洁,以下将与一个业务相关的数据简称为业务数据。并且,主集群101可以周期性的将用户在该主集群101上保存的业务数据备份至备集群102。这样,当主集群101发生故障时,备集群102可以利用备份的业务数据继续为用户104提供相应的业务服务,以提高数据备份系统100为用户104存储业务数据的可靠性。
举例来说,假设主集群101以及备集群102基于hadoop架构进行构建,则主集群101可以包括hadoop分布式文件系统(hadoop distributed file system,HDFS)组件1011以及Hive组件1012,如图2所示。其中,HDFS组件用于对文件进行存储,将每个文件存储成一系列的数据块,可以提供高吞吐量的数据访问,适合应用于大规模数据集;Hive组件用于对数据进行提取、转化以及加载,以便在基于hadoop架构构建的集群中存储、查询数据。具体的,主集群101在存储多个用户的业务数据时,可以利用多个组件,将每个用户的业务数据按照不同的格式进行存储。其中,HDFS组件1011,用于将用户的音视频、图片等业务数据封装成文件格式并存储至主集群101,并且该文件格式的业务数据保存在相应的目录下,例如主集群101通过HDFS组件1011存储的属于用户106的业务数据位于目录3下(目录1以及目标2下的业务数据可以是其它用户的业务数据)。Hive组件1012,用于将用户的业务数据封装成结构化数据,并通过表格形式将其存储至主集群101,例如主集群101通过Hive组件1012存储的属于用户106业务数据为表格3(表格1以及表格2中的业务数据可以是其它用户的业务数据)。相应的,备集群102上可以包括HDFS组件1021、Hive组件1022,备集群102上的组件与主集群101上的组件的作用类似,在此不做赘述。当主集群101将其上的业务数据备份至备集群102时,相关技术中,在进行备份时,分别启动复制任务1以及复制任务2,并通过创建相应的进程执行这些复制任务,其中,复制任务1用于将与HDFS组件1011相关的数据(包括:目录1、目录2、目录3)备份至HDFS组件1021对应的存储区域中,复制任务2用于将与Hive组件1012相关的数据(包括:表格1、表格2、表格3)备份至Hive组件1022对应的存储区域中。
但是,上述以组件为粒度的数据备份,通常仅能使得主集群101与备集群102之间在单个组件上的业务数据保持一致,而备份至备集群102的多个组件中的数据在时间维度上可能并不一致。具体的,在备份业务数据的过程中,由于备份与HDFS组件1011相关的业务数据的过程以及备份与Hive组件1012相关的业务数据的过程相互独立,导致复制任务1以及复制任务2可能在不同时刻开始执行数据备份过程,这使得最终备份至HDFS组件1021对应的存储区域中的业务数据以及备份至Hive组件1022对应的存储区域中的业务数据,可能是主集群101存储的不同时间的用户104的业务数据。从而,当容灾切换后,备集群102基于不同时间的业务数据为用户提供业务服务,可能会导致备集群102提供的业务服务出错。本实施例中,与组件(如上述HDFS组件1011、Hive组件1022)相关的数据,例如可以是通过该组件进行封装等处理的数据,或者可以是通过该组件进行存储的数据等。
举例来说,当复制任务1在13:00:00时刻执行,而复制任务2在13:01:00时刻执行时,则最终备份至备集群102上HDFS组件1021对应的存储区域中的业务数据为多个用户(包括用户104)在13:00:00之前的业务数据,而备份至Hive组件1022对应的存储区域中的业务数据为多个用户(包括用户104)在13:01:00之前的业务数据。因此,如果在13:00:00~13:01:00期间, 主集群101通过HDFS组件1011和Hive组件1012保存了属于用户104的新的业务数据(如主集群101对已保存的部分业务数据进行了修改),则备份至HDFS组件1021中的业务数据缺失了13:00:00~13:01:00这一时间段内的新数据。这样,当主集群101发生故障时,用户104通过客户端103访问备集群102中备份的业务数据时,访问Hive组件1022可以获得最新的、正确的业务数据,而访问HDFS组件1021所获得的部分业务数据可能存在错误(为修改前的旧数据),从而可能导致数据备份系统100为用户104提供数据存储等业务服务时出错。比如,当数据备份系统100中存储有清理账单业务的数据时,用户104可能需要访问目录3下的账单明细数据以及表格3中的客户名单,并基于时间维度对目录3下的账单明细数据以及表格3中的客户名单进行比对校验,以便确定每个客户对应的账单。此时,若客户名单与账单明细数据属于不同时间段的数据,则可能存在部分客户已经结清的账单被认定为未结清,又或者账单明细数据中存在部分账单并不属于客户名单中的任意客户等错误,从而导致数据备份系统100存储的业务数据的可信度降低,也即降低了数据备份系统100提供数据存储服务的质量。
基于此,本申请实施例提供一种数据备份系统。参见图3,为本申请实施例提供的数据备份系统的架构示意图。如图3所示,在图1所示的数据备份系统100的基础上,图3所示的数据备份系统300中包括主集群101以及备集群102、主控制设备103、备控制设备104。并且,该数据备份系统300可以与应用服务器105连接。其中,主集群101以及备集群102可以通过至少一个设备(如服务器、虚拟机、容器、存储设备等)实现。例如,主集群101以及备集群102均可以通过包括多个服务器的集群(如基于hadoop架构的集群等)实现。在部分场景中,主集群101以及备集群102也可以分别通过单个设备实现等。主集群101用于为应用服务器105中部署的应用1以及应用2提供数据的存储和处理,并供应用服务器105进行数据的读写。图3中以应用服务器105包括两个应用进行示例,实际应用时,应用服务器105可以包括任意数量的应用。备集群102作为主集群101的备份集群,用于备份主集群101中的数据,在发生容灾切换后,由备集群102支持应用服务器105的数据读写。
图3示出的主控制设备103和备控制设备104用于控制主集群101和备集群102之间的数据备份。具体地,在主集群101发生故障之前,主控制设备103控制主集群101和备集群102之间进行数据备份;而在主集群101发生故障之后,备控制设备104控制备集群102与故障恢复后的主集群101(或者其它作为备集群102的冗余备份的集群)之间进行数据备份。示例性地,主控制设备103、备控制设备104具体可以是服务器、虚拟机或容器等。图3示出了主控制设备103与备集群102共同部署于备站点;备控制设备104与主集群101共同部署于主站点,其中,主站点和备站点可以是两个独立的区域(region)、机房或者是两个不同的局域网下的设备集群,通常主站点和备站点还具有独立的风、火、水、电控制系统。图3所示的主控制设备103和备控制设备104的部署方式仅是一种示例。另外,图3是以应用服务器105部署于数据备份系统300外部为例进行示例性说明,在其它可能的实现方式中,应用服务器105也可以部署于数据备份系统300中,此时,当主控制设备103、备控制设备104、以及应用服务器105为虚拟机或容器时,该主控制设备103以及备控制设备104还可以与应用服务器105部署于同一服务器,或者,主控制设备103可以与应用服务器105部署于同一服务器,或者,备控制设备104可以与应用服务器105部署于同一服务器。在部分实现场景中,数据备份系统300中还可以仅有一个控制设备,此时,该控制设备可以具有本申请实施例中主控制设备103以及备控制设备104的功能。应理解,图3所示的数据备份系统300仅作为一种示例性说明,本实施例对此并不进行限定。
实际部署时,数据备份系统300可以部署于云环境,例如数据备份系统300可以是基于云环境的多个区域构建等。或者,数据备份系统300也可以是部署于边缘环境,并通过边缘环境 的多个机房进行构建。应理解,本申请中的云环境表示云服务提供商设立的,用于为多个区域(region)(例如:华东区域、华北区域)的租户(上述用户106也即租户)提供服务(如数据存储服务等)的资源集合。云环境通常包括大量的资源,可以为各区域的租户提供基础资源服务和/或软件应用服务。本申请中的边缘环境表示针对特定地区的租户提供基础资源服务和/或软件应用服务的资源集合,边缘环境相比于云环境在物理位置上可以距租户更近,在提供一些业务时,可以更好地保证业务的低时延。
应理解,图3所示的数据备份系统300的架构仅作为一种示例性说明,实际应用时,数据备份系统300的架构也可以采用其它可能的实施方式。比如,数据备份系统300还可以包括其它设备,如包括用于对主集群101以及备集群102进行管理的设备等。又比如,数据备份系统300中的主集群101,除了可以包括图2所示的HDFS组件1011以及Hive组件1012之外,还可以包括更多数量的组件,如图2中主站点101还可以包括处理业务数据效率更高的SparkSQL组件,或者还可以包括轻量目录访问协议(Lightweight Directory Access Protocol,LDAP)组件或活动目录(active directory,AD)组件等。本实施例对于数据备份系统300的具体实现方式并不进行限定。
在图3所示的数据备份系统300中,主控制设备103在控制主集群101与备集群102之间备份业务数据时,具体是以业务为粒度进行数据的备份,以此实现备份至备集群102上的业务数据在时间维度保持一致。具体的,主控制设备103预先配置有针对业务的数据备份策略,该数据备份策略包括主集群101中存储的与该业务相关的多个数据集的信息以及第一时刻。示例性地,如图4所示,针对用户106的业务,与该业务相关的多个数据集中,部分数据集可以是图2中与HDFS组件1011相关的属于用户106的目录3下的业务数据的集合,而其余数据集可以是与Hive组件1012相关的属于用户106的表格3中的业务数据的集合。另外,数据备份策略中的第一时刻,用于指示备份至备集群102的业务数据为第一时刻的主集群101中已经存储的与该业务相关的数据。然后,当主集群101上该业务的数据需要被备份至备集群102时,主控制设备103可以根据为该业务配置的数据备份策略控制主集群101或备集群102,将第一时刻的主集群101中与该业务相关的多个数据集被备份至备集群102。例如,主控制设备103可以向主集群101发送指令,以指示主集群101或备集群102在第一时刻将与HDFS组件1011相关的目录3下的业务数据以及与Hive组件1012中相关的表格3中的业务数据备份至备集群102等。这样,以业务为粒度备份至备集群102上的数据,为在第一时刻主集群101中已经存储的与该业务相关的数据,从而备份的与该业务相关的数据能够在时间维度上保持一致。如此,当主集群101发生故障时,备集群102可以基于同一时间段的业务数据恢复该业务运行,从而可以尽可能避免数据备份系统300提供业务服务出错,进而可以提高数据备份系统300为用户存储业务数据的可信度,提高业务服务质量。
接下来,对数据备份系统的各种非限定性的具体实施方式进行详细描述。
在图3所示的数据备份系统300的基础上,参阅图5所示的数据备份系统300。主集群101可以包括一个或者多个用于处理或存储业务数据的组件,如图5所示,主集群101包括组件1、组件2以及组件3等,该组件1、组件2、组件3可以用于对一个或者多个用户的业务数据进行处理或存储。并且,实际应用时,任意一个或多个组件可以用于处理或存储同一个业务的不同数据,例如,组件1以及组件2可以将同一业务的不同数据封装成不同的数据格式并进行存储。当主集群101(以及备集群102)基于hadoop架构进行构建时,组件1例如可以是图2中的HDFS组件1011,用于将一个业务的部分业务数据封装成文件格式并进行存储;组件2例如可以是图 2中的Hive组件1012,用于将该业务的另一部分业务数据封装成结构化数据。或者,组件1以及组件2也可以是将同一业务的不同业务数据封装成相同格式进行存储。比如,组件1可以是图2中的Hive组件,而组件2可以是图2中的SparkSQL组件,并且,组件1以及组件2均可以将该业务的数据封装成结构化数据进行存储(其中,SparkSQL组件的数据读写性能通常高于Hive组件的数据读写性能)。本实施例中,组件1、组件2以及组件3可以是主集群101上用于封装并存储非流式数据的任意组件。并且,在处理或存储不同业务的数据时所使用的组件可以存在差异。比如,在处理或存储业务1的数据时,所使用的组件可以仅包括组件1;在处理或存储业务2的数据时,所使用的组件可以包括组件1以及组件2;在处理或存储业务3的数据时,所使用的组件可以包括组件1、组件2以及组件3。
备集群102中也可以包括一个或者多个组件,如图5所示,备集群102包括组件4、组件5以及组件6等。与主集群101类似。备集群102上的组件4、组件5以及组件6可以用于对一个或者多个用户的业务数据进行处理或存储,同一业务的不同数据可以被封装成相同或者不同的数据格式,并且,在处理或者存储不同业务的数据时所使用的组件可以存在差异等。其中,备集群102上的组件可以作为主集群101上的组件的备份。例如,备集群102上的组件4可以作为主集群101上的组件1的备份,组件5可以作为组件2的备份,组件6可以作为组件3的备份。相应的,备集群102在主集群101发生故障之前,周期性的对主集群101上的业务数据进行备份。在备份业务数据的过程中,主集群101上与各组件相关的业务数据可以被备份至备集群102上作为备份的组件中。比如,主集群101上与组件1相关的业务数据,可以被备份至集群102上的组件4对应的存储区域中;与组件2相关的业务数据,可以被备份至组件5对应的存储区域中;与组件3相关的业务数据,可以被备份至组件6对应的存储区域中。这样,当主集群101发生故障并由备集群102接管主集群101上的业务时,备集群102可以利用备份至组件4、组件5以及组件6分别对应的存储区域中的数据继续为用户提供读写业务数据的服务。
实际应用时,主集群101以及备集群102所在的云环境或者边缘环境包括的计算设备(如服务器等)较多,从而在云环境或者边缘环境可以构建多组主集群以及多组备集群。因此,可以预先在云环境或者边缘环境对主集群101以及备集群102进行配对,以构建图5所示的数据备份系统300。
作为一种实现示例,云环境或者边缘环境可以向管理员呈现如图6所示的集群配对界面,并且在该集群配对界面中不仅可以呈现有多组集群标识(如图6所示的集群1、集群2、集群3等),还可以呈现各集群的相关信息,如图6所示的集群的位置信息、资源规格信息等,从而管理员可以从集群配对界面呈现的多个集群中,选择相应的集群作为主集群101以及备集群102,以此构建数据备份系统300。实际应用时,云环境或者边缘环境也可以是根据用户针对主集群101以及备集群102的配对操作构建数据备份系统300等,本实施例对此并不进行限定。
在完成对于主集群101以及备集群102的配对后,主集群101以及备集群102之间可以进行数据通信,比如,备集群102可以周期性的将主集群101上的业务数据备份至备集群102。在进一步可能的实施方式中,主集群101以及备集群102之间在进行业务数据备份之前,可以预先完成通信认证过程,以使得主集群101与备集群102双方互信。示例性地,主集群101与备集群102之间可以基于传输控制协议/因特网互联协议(Transmission Control Protocol/Internet Protocol,TCP/IP)网络系统设计中可信的第三方认证协议进行通信认证,该第三方认证协议例如可以是Kerberos协议等。实际应用时,主集群101与备集群102之间可以自动执行通信认证过程,也可以在用户或者管理员的介入下完成双方通信认证等。如此,可以提高主集群101与备集群102之间进行数据通信的安全性以及可靠性。
在构建出数据备份系统300并且完成主集群101以及备集群102之间的通信认证后,主集群101以及备集群102之间可以以业务为粒度,将主集群101上的第一业务的数据备份至备集群102。下面以备份用户1的第一业务的数据为例进行示例性说明。
具体实现时,如图5所示,主控制设备103上可以运行用于配置数据备份策略和控制主备集群的程序软件,例如:该程序软件可以为用户1的第一业务配置第一数据备份策略,该第一数据备份策略中包括主集群101中与第一业务相关的多个数据集的信息以及第一时刻。其中,与第一业务相关的多个数据集,具体可以包括由组件1进行处理或存储的数据集以及由组件2处理或存储的数据集。本实施例中,与第一业务相关的多个数据集也可以称之为第一保护组。示例性地,第一保护组的信息具体可以是组件中的文件目录的标识,如图2中的目录3的名称等;或者可以是组件中的表格的标识,如图2中的表格3的名称等。第一时刻,用于指示备份至备集群102的业务数据为在该第一时刻主集群101中已经存储的第一业务相关的多个数据集。
在一种可能的实施方式中,主控制设备103可以基于用户1的操作实现为第一业务配置第一数据备份策略。例如,主控制设备103可以向用户1呈现如图7所示的策略配置界面,该策略配置界面中可以呈现有提示信息(如图7所示的“请指定备份的业务以及备份时刻”),以提示用户1输入所要配置的第一业务的标识(如第一业务的名称等)以及对该第一业务的数据进行备份的时刻。然后,主控制设备103可以根据用户1在控制策略配置界面上输入的第一业务的标识,提示是否为第一业务配置第一保护组。并且,在获知用户1确定为第一业务配置第一保护组后(如用户1在策略配置界面上点击“是”按钮),进一步呈现主集群101上存储的与第一业务相关的一个或者多个数据集的名称,例如图7所示的与组件1相关的数据集1以及与组件2相关的数据集2和数据集3的名称等,并提示用户1为第一业务配置第一保护组,也即提示用户1配置需要备份至备集群102上的与第一业务相关的一个或者多个数据集。这样,主控制设备103可以基于用户1对于数据集1至数据集3的选中操作或输入操作,确定第一业务的保护组包括数据集1、数据集2以及数据集3。同时,主控制设备103还可以获取用户1在策略配置界面中输入对第一业务的数据进行备份的第一时刻,从而主控制设备103基于该第一时刻以及上述第一保护组的信息生成第一业务对应的第一数据备份策略。实际应用时,备集群102可以周期性的针对主集群101上的第一业务的数据进行备份,此时,策略配置界面中可以同时呈现备份时刻的输入框以及备份周期的输入框,以便用户1在策略配置界面上针对备份时刻的输入框中输入第一时刻,也即输入该周期性备份的起始时刻。同时,用户1还可以在该策略配置界面上针对备份周期的输入框中输入备份周期,以便备集群102从第一时刻开始,按照用户1配置的备份周期,对主集群101上存储的第一业务的数据进行周期备份。相应的,在周期备份过程中的第二次备份(以及后续备份)时,主控制设备103可以根据第一时刻与备份周期,确定第二次备份的时刻,以便将在第二次备份的时刻的主集群中与第一业务相关的多个数据集备份至备集群102。
当然,上述示例仅作为示例性说明,并不用于限定主控制设备103配置第一数据备份策略的具体实现。比如,在其它示例中,用户1也可以是直接在策略配置界面中输入与第一业务相关的数据集的标识(如数据集的名称或者编号等),以指定第一业务的第一保护组;或者,主控制设备103也可以是自动将与第一业务相关的所有数据集确定为第一业务的第一保护组等。
在成功为第一业务配置第一数据备份策略后,主控制设备103可以根据该第一数据备份策略控制主集群101或备集群102,将第一时刻的主集群101中与第一业务相关的多个数据集备份 至备集群102。
作为一种实现示例,主控制设备103可以基于该数据备份策略生成第三指令,并将该第三指令发送给主集群101,以指示主集群101获取在第一时刻的与第一业务相关的多个数据集的快照。其中,第三指令中可以包括第一保护组的信息以及第一时刻,该第一时刻晚于主控制设备103向主集群101发送第三指令的时刻。相应的,主集群101在接收到该第三指令后,解析出第三指令中第一保护组的信息以及第一时刻,并根据该第一保护组的信息,确定主集群101中分别由组件1以及组件2处理或存储的与第一业务相关的多个数据集。然后,主集群101可以利用其上的备份管理装置1011创建进程1以及进程2,并启动包括第一时刻的快照任务1以及快照任务2。其中,进程1负责执行快照任务1,具体用于访问组件1,并在第一时刻对组件1处理或存储的与第一业务相关的数据集进行快照,得到第一快照;同时,进程2负责执行快照任务2,具体用于访问组件2,并在第一时刻对组件2处理或存储的与第一业务相关的数据集进行快照,得到第二快照。在其它实施例中,备份管理装置1011也可以利用主集群101上的至少一个执行器执行上述快照任务,其中,每个执行器例如可以是通过执行线程实现。
当完成业务数据的快照后,备份管理装置1011可以通知备集群102中的主控制设备103快照完成,从而主控制设备103可以向备集群102上的备份管理装置1021下发第二指令,以指示备集群102复制主集群101中与第一业务相关的多个数据集在第一时刻的快照所对应的数据。示例性地,该第二指令中可以包括上述第一快照以及第二快照的指示信息。备份管理装置1021可以根据该第二指令确定主集群101上的第一快照以及第二快照,并针对第一快照启动复制任务1以及针对第二快照启动复制任务2。在执行复制任务时,备集群102上可以包括至少一个执行器,该至少一个执行器可以执行复制任务1,具体可以是读取主集群101在第一时刻的第一快照,并根据该第一快照将由组件1处理或存储的与第一业务相关的数据集复制至备集群102,例如可以是将其写入备集群102上的组件4对应的存储区域中。并且,备集群102上的至少一个执行器可以执行复制任务2,具体可以是读取主集群101在第一时刻的第二快照,并根据该第二快照将由组件2处理或存储的与第一业务相关的数据集复制至备集群102,例如可以是将其写入备集群102上的组件5对应的存储区域中。如此,可以实现以业务为粒度将主集群101上属于第一业务的数据备份至备集群102。并且,备份至备集群102上的业务数据均为主集群101在第一时刻已经存储的第一业务的数据,从而对于同一业务的数据,备份后的业数据能够在时间维度上保持一致。
本实施例中,备集群102可以利用一个执行器,依次执行多个复制任务;或者,备集群102可以利用多个执行器,并行执行多个复制任务,提高数据备份效率。另外,主集群101上备份管理装置1011以及备集群102上的备份管理装置1021可以是运行在相应服务器上的应用程序。或者,备份管理装置1011以及备份管理装置1021可以是运行应用程序的硬件,例如:单独配置的处理器核、处理器以及服务器中的任意一种等,本实施例对于备份管理装置的具体实现方式并不进行限定。
上述实施方式中,是以备集群102主动备份业务数据为例进行示例性说明,实际应用时,也可以是由主集群101主动将与第一业务相关的数据集备份至备集群102。
作为一种实现示例,主控制设备103在确定主集群101完成对与组件1以及组件2相关的第一业务的多个数据集进行快照后,可以向主集群发送第一指令,以指示主集群101将与所述第一业务相关的所述多个数据集在第一时刻的快照所对应的数据发送至备集群102。示例性地,该第一指令中可以包括上述第一快照以及第二快照的指示信息。主集群101在接收到该第一指令后,可以根据该第一指令确定主集群101上的第一快照以及第二快照。然后,主集群101可 以通过一个或者多个执行器,根据第一快照,将由组件1处理或存储的与第一业务相关的数据集传输至备集群102;并通过该一个或者多个执行器,根据第二快照,将由组件2处理或存储的与第一业务相关的数据集传输至备集群102。其中,主集群101根据快照执行数据备份的具体实现过程,与前述备集群102根据快照执行数据备份过程的具体实现方式类似,可参照前述相关之处描述理解,在此不做赘述。
进一步的,在实现将与第一业务相关的多个数据集备份至备集群102后,备集群102还可以对备份的多个数据集进行快照,并对得到的快照进行存储。此时,备集群102针对备份的业务数据进行快照后所得到的快照,与主集群101在第一时刻对第一业务的数据进行快照所得到的快照通常一致。如此,备集群102可以在未来时刻利用该快照确定数据备份系统300在第一时刻所存储的第一业务的数据。
实际应用场景中,主集群101以及主控制设备103可能由不同的计算设备构建,从而主集群101与主控制设备103之间的时钟源可能并不统一。这样,主控制设备103在第三指令中所指示的第一时刻,与主集群101实际执行快照操作的时刻可能并不相同。比如,假设主控制设备103上的时钟比主集群101上的时钟快5秒,并且,主控制设备103在第三指令中所指示的第一时刻为13:00:00,则主集群101在基于该第一指令执行快照操作时,由于其时钟比主控制设备103慢5秒,这使得主集群101实际执行快照操作的时刻为主控制设备103的13:00:05,从而导致主控制设备103指示快照第一业务的数据的时刻与主集群101实际执行快照操作的时刻不一致。基于此,本实施例中主集群101以及主控制设备103之间可以设置相同的时钟源。例如,主控制设备103可以通过网络时间协议(Network Time Protocol,NTP)与主集群101进行时钟同步,以使得主控制设备103与主集群101之间具有相同的时钟源。当然,主集群101与主控制设备103也可以是通过其它方式实现时钟同步,本实施例对此并不进行限定。进一步地,主集群101与主控制设备103,还可以与备集群102进行时钟同步,以便实现在数据备份系统300中进行时钟统一。
实际应用场景中,主集群101不仅可以存储第一业务的数据,还可以存储其它业务的数据,并且不同业务的数据通常并不相同,不同的业务可以来自相同的用户或者不同的用户。以主集群101同时存储第一业务以及第二业务的数据为例,对于第一业务的数据,其可以是如图8所示的目录1下的业务数据以及表格1中的业务数据,而对于第二业务的数据,其可以是如图8所示的目录2下的业务数据、表格2中的业务数据以及表格3中的业务数据。因此,在以业务为粒度对业务数据进行快照和备份时,主控制设备103除了可以为第一业务配置第一数据备份策略之外,还可以为第二业务配置第二数据备份策略,该第二数据备份策略中包括主集群101中与第二业务相关的多个数据集的信息以及第二时刻。本实施例中,与第二业务相关的多个数据集也可以是称之为第二保护组,该第二保护组包括主集群101中由组件1处理或存储的与第二业务相关的数据集、由组件2处理或存储的与第二业务相关的数据集以及由组件3处理或存储的与第二业务相关的数据集(假设主集群101利用组件1、组件2以及组件3对第二业务的数据进行处理或存储)。这样,主控制设备103可以基于第二数据备份策略控制主集群101或备集群102将第二时刻的主集群中的第二保护组(也即与第二业务相关的多个数据集)备份至备集群102。其中,主控制设备103配置第二数据备份策略以及根据第二数据备份策略实现将主集群101上与第二业务相关的多个数据集备份至备集群102的具体实现过程,与上述主控制设备103配置第一数据备份策略以及根据第一数据备份策略实现将主集群101上与第一业务相关的多个数据集备份至备集群102的具体实现过程类似,可参照前述实施例的相关之处描述,在此不做赘述。
通常情况下,主集群101上的业务数据,可以被周期性的备份至备集群102。在一种实现方式中,每次备份至备集群102的业务数据,可以是主集群101上属于第一业务的所有业务数据。而在另一种实现方式中,主集群101以及备集群102之间进行第一次备份时,备份至备集群102上的业务数据可以是主集群101在第一时刻已经存储与第一业务相关的数据,而当主集群101以及备集群102之间进行第二次备份时,备份至备集群102上的业务数据可以是第一时刻至第三时刻期间,主集群101上的增量数据。其中,第三时刻为第二次对第一业务的数据进行备份的时刻。作为一种实现示例,主集群101在第一时刻完成对业务数据进行快照后,可以在第三时刻对由组件1存储或存储的与第一业务相关的数据集进行快照,得到第三快照,并同时在第三时刻对由组件2存储或存储的与第一业务相关的数据集进行快照,得到第四快照,该第三时刻晚于上述第一时刻。实际应用中,该第二时刻与第一时刻之间的间隔时长,例如可以是主集群101与备集群102之间的数据备份周期。或者,该第二时刻也可以是由主控制设备103进行指定等。主集群101在完成第二轮的快照操作后,可以通过备份管理装置1011将其通知给主控制设备103,并由主控制设备103指示备份管理装置1021执行第二轮的业务数据备份过程。与第一轮的业务数据备份过程类似,备份管理装置1021可以针对第三快照以及第四快照启动新的复制任务3以及复制任务4。然后,备集群102可以利用至少一个执行器执行复制任务3,具体可以是根据第一快照以及第三快照,确定主集群101上组件1在第一时刻至第三时刻这一时间段内处理或存储的与第一业务相关的增量数据(以下称之为第一增量数据),并将该第一增量数据备份至备集群102,例如可以是备份至组件4对应的存储区域中。并且,备集群102可以利用至少一个执行器执行复制任务4,具体可以是根据第二快照以及第四快照,确定组件2在第一时刻至第三时刻这一时间段内处理或存储的与第一业务相关的增量数据(以下称之为第二增量数据),并将该第二增量数据备份至备集群102,例如可以是备份至组件5对应的存储区域中。如此,后续每次备份业务数据时,主集群101以及备集群102之间可以仅传输增量数据,而可以无需将主集群101上的所有与第一业务相关业务数据均传输至备集群102,从而可以有效减少主集群101与备集群102之间的业务数据的传输量,这在提高备份效率的同时,也能减少备份业务数据所需的资源消耗。
上述是对主集群101与备集群102之间的数据备份过程进行介绍,下面介绍当主集群101发生故障时,数据备份系统300中的容灾切换过程。
在主集群101上的业务数据被备份至备集群102后,若主集群101发生故障,则主集群101可能难以继续为用户提供业务数据的读写服务,此时,备集群102可以对主集群101当前运行的业务进行接管,并利用预先备份的业务数据,继续为用户提供业务数据的读写服务,以此保证数据备份系统300对于存储用户的业务数据的可靠性。
通常情况下,用户可以通过应用服务器105上的客户端对主集群101或备集群102上存储的业务数据进行访问。示例性地,如图5所示,应用服务器105中包括主客户端1051以及备客户端1052。其中,在主集群101故障之前,用户可以通过应用服务器105上的主客户端1051访问主集群101,而在主集群101故障之后,用户可以通过应用服务器105上的备客户端1052访问备集群102。
实际应用时,应用服务器105可以在主集群101发生故障后,自动将访问集群的客户端由主客户端1051调整为备客户端1052。
首先,应用服务器105在运行时,可以通过主客户端1051以及备客户端1052感知发生故障的集群。作为一种示例,主客户端1051可以向主集群101上的分布式应用程序协调服务1(如zookeeper服务等)注册检测事件,并接收该分布式应用程序协调服务1反馈的检测结果,该检 测结果可以指示主集群101是否发生故障(如失效等)。另外,备客户端1052也可以向备集群102上的分布式应用程序协调服务2(如zookeeper服务等)注册检测事件,并接收该分布式应用程序协调服务2反馈的检测结果,该检测结果可以指示备集群102是否发生故障(如失效等)。比如,在备集群102接管主集群101故障之前所运行业务的过程中,应用服务器105可以通过备客户端1052确定备集群102是否发生故障,并在确定备集群102未发生故障后,指示备集群102接管主集群101在故障之前运行的业务。
然后,应用服务器105可以在感知到主集群101发生故障后,依据集群的状态信息切换访问集群的客户端。具体的,应用服务器105还可以包括仲裁模块1053,并可以从主客户端1051以及备客户端1052中分别获取指示主集群101以及备集群102身份的信息。其中,主客户端1051在向分布式应用程序协调服务1注册检测事件后,分布式应用程序协调服务1可以向主客户端1051反馈第一状态信息,该第一状态信息用于指示本集群为主身份还是备身份(也即指示本集群当前为主集群还是备集群),或者指示本集群是否发生失效(如因集群故障而失效等)。同时,分布式应用程序协调服务2可以向备客户端1052反馈第二状态信息,该第二状态信息用于指示为主身份还是备身份,或者指示本集群是否发生失效。然后,仲裁模块1053可以分别从主客户端1051中获取第一状态信息,从备客户端1052中获取第二状态信息。当第一状态信息用于指示主集群101为备身份或者集群失效,而第二状态信息用于指示备集群102为主身份时,仲裁模块1053可以确定应用服务器105访问集群的客户端切换为备客户端1052。值得注意的是,在主集群101故障之前,第一状态信息用于指示主集群101为主身份,第二状态信息用于指示备集群102为备身份。而在主集群101发生故障后,主控制设备103可以向用户或管理员提示主集群故障的信息,以便由用户或者管理员通过主控制设备103或者其它设备针对备集群的身份调整操作,将备集群102的身份由备身份调整为主身份。如此,可以尽可能避免数据备份系统300因为程序运行错误而出现主集群101以及备集群102的主备身份发生异常切换。进一步的,用户或者管理员还可以对主集群进行身份调整,具体可以是将主集群101的身份由主身份调整为备身份(或者集群失效),以此实现主集群101与备集群102之间的身份反转。此时,仲裁模块1053从主客户端1051中获取的第一状态信息用于指示主集群101为备身份或者集群失效,从备客户端1052中获取的第二状态信息用于指示备集群102为主身份。
如此,在主集群101发生故障后,应用服务器105能够将访问数据的集群由主集群101自动切换至备集群102,无需人工介入对应用服务器105所访问的集群进行切换,从而可以提高数据备份系统300的灵活性,降低人工运维成本。
值得注意的是,本实施例上述描述中,是以应用服务器105与主控制设备103独立部署,且主客户端1051和备客户端1052以及仲裁模块1053部署在应用服务器105为例进行示例性说明。在其它可能的数据备份系统300中,应用服务器105可以与主控制设备103集成部署,也即:由一个设备实现应用服务器105和主控制设备103的功能,该设备可以称为控制设备或称为应用服务器,即该控制设备或者应用服务器中可以集成有如图4所示的主客户端1051、备客户端1052以及仲裁模块1053,并由该控制设备或者应用服务器执行上述应用服务器105所执行的访问集群以及感知集群故障等操作。在另一些实施例中,当应用服务器105和主控制设备103独立部署时,上述应用服务器105中的主客户端1051、备客户端1052以及仲裁模块1053的功能也可以由控制设备103实现,也即控制设备103执行上述主备客户端自动切换的操作。由于主备客户端在控制设备103中,应用服务器105产生的数据的读写请求会发送至控制设备103中,由控制设备103中的当前为主身份的主客户端(或备客户端)执行对主集群(或备集群)中的数据的读写。
进一步的,当第一状态信息以及第二状态信息均指示各自集群为主集群时(如故障恢复后的主集群101在被主控制设备103管理之前,主集群101上的分布式应用程序协调服务1向主客户端1051指示本集群为主集群),应用服务器105仍然采用当前的访问策略对集群进行访问,即可以不对当前所访问的集群进行切换。
本实施例中,备集群102在接管主集群101故障之前所运行业务的过程中,不仅需要与这些业务相关的数据,还需要这些所属用户的数据,如用户名称、用户权限等。
示例性地,在应用服务器105将其访问的集群由主集群101切换至备集群102后,可以由运维人员在备集群102上配置相应的用户数据,以便备集群102基于该用户数据提供相应的数据读写服务。其中,所配置的用户数据例如可以包括用户标识、用户权限、用户所属租户的标识等数据中的至少一种。通常情况下,运维人员的人工配置过程,如在备集群102上执行创建用户、租户、申请权限的操作等,会增大数据备份系统300的恢复时间目标(recovery time object,RTO),也即增大了灾难(主集群101故障)发生后,数据备份系统300暂停业务至恢复业务之间的时长。
基于此,在另一种实现方式中,在主集群101故障之前,备集群102不仅可以将主集群101上的业务数据备份至备集群102,还可以将主集群101上的用户数据也备份至备集群102。具体实现时,主控制设备103可以向主集群101发送第四指令,以指示主集群101根据该第四指令将用户数据同步至备集群102。或者,主控制设备103可以获取主集群101以及备集群102中存储的用户数据,并根据主集群101中存储的用户数据对备集群102中存储的用户数据进行调整,以使得主集群101与备集群102中存储的用户数据一致,如根据主集群101中存储的用户数据在备集群102中添加未存储的用户数据,或者对备集群102中存储的用户数据进行修改等。这样,当备集群102接管主集群101上的业务时,可以根据备份至备集群102上的用户数据为用户提供相应的业务服务,从而无需由运维人员在备集群102上进行人工配置。如此,不仅可以降低于运维人员的运维成本,而且,也能有效减小数据备份系统300的恢复时间目标。
值得注意的是,上述实施方式中,是以主集群101发生故障为例进行示例性说明,实际应用场景中,当主集群101未发生故障时,若接收到用户或者管理员所指示的集群切换命令,则应用服务器105可以将访问的集群由主集群101自动切换至备集群102,,其实现集群切换的具体实现过程,与上述实现过程类似,可参照前述相关之处描述进行理解,在此不做赘述。
参阅图9,为本申请实施例中一种数据备份方法的流程示意图。该方法可以应用于图5所示的数据备份系统300。该方法以主集群和备集群为基于Hadoop架构构建的集群为例进行介绍,本实施例中,图5所示的数据备份系统300中的组件1以及组件4具体是HDFS组件,组件2以及组件5具体是Hive组件,而组件3以及组件6具体是SparkSQL组件。其中,备控制设备104可以在应用服务器105所访问的集群切换至备集群102时,将新存储至备集群102上的业务数据备份至主集群101(该主集群101例如可以是在故障后完成恢复等)。实际应用时,备控制设备104的功能与主控制设备103的功能类似,具体可以参考下述关于主控制设备103的功能描述。下面以备份T
0时刻的业务数据为例进行说明,图9所示的数据备份方法具体可以包括:
S901:主控制设备103将配置信息同步至备控制设备104。
本实施例中,在主集群101正常运行的过程中,主控制设备103可以控制实现将主集群101上的业务数据备份至备集群102。相应的,当主集群101故障后,备集群102可以接管主集群101上的业务,在此过程中,如果主集群101故障恢复,则备控制设备104可以控制实现将备集群102上的业务数据备份至(故障后恢复的)主集群101。为此,主控制设备103可以预先将自身 相关的配置信息同步至备控制设备104,以便于当主集群101发生故障后,备控制设备104能够控制实现相应的业务数据备份过程,从而无需由运维人员重复进行人工配置。其中,主控制设备103中的配置信息,可以在设备部署过程中由管理员配置在该主控制设备103中,以便主控制设备103可以根据配置的信息控制主集群101与备集群102之间进行数据备份。
示例性地,主控制设备103同步的配置信息,可以包括数据备份系统300的相关信息,如主集群101与备集群102之间的配对关系、数据备份系统300所包括的资源、数据备份系统300当前所备份的业务数据的时间点等信息。
应理解,上述S801是可选的,在一些实施例中,备控制设备104也可以主动获取上述配置信息,或许由管理员对备控制设备104进行配置。
S902:主控制设备103根据用户106的设置操作,为第一业务配置数据备份策略,该数据备份策略包括保护组的信息以及备份第一业务的数据的T
0时刻,其中,保护组的信息用于指示由HDFS组件、Hive组件以及SparkSQL组件存储的与第一业务相关的数据集。
实际应用时,用户106可以根据第一业务的数据在主集群101上存储时所生成的数据集,设置第一业务对应的保护组,该保护组的信息指示了主集群101中与该第一业务相关的多个数据集,以便于后续对第一业务的数据进行备份时,将保护组所指示的多个数据集备份至备集群102上。其中,用户106为第一业务创建保护组的具体实现过程,可参见前述实施例的相关之处描述,在此不做重述。
本实施例中,以保护组包括由HDFS组件、Hive组件以及SparkSQL组件存储的与第一业务相关的多个数据集为例,该保护组的信息例如可以是该多个数据集的标识。在其它可能的实施例中,用户106所设置的保护组也可以是仅包括由其中任意2个组件所存储的与第一业务相关的多个数据集,如用户106设置的保护组也可以是仅包括由HDFS组件以及Hive组件存储的与第一业务相关的多个数据集等。
另外,用户106还可以指示数据备份系统300对第一业务的数据进行备份的时刻为T
0时刻,从而后续数据备份系统300可以在该T
0时刻对第一业务的数据进行快照和备份。其中,保护组的信息与T
0时刻可以构成主控制设备103为第一业务配置的数据备份策略,其具体实现过程可参见前述实施例的相关之处描述。
S903:主控制设备103根据配置的数据备份策略,在T
0时刻之前向主集群101上备份管理装置1011发送第三指令,该第三指令中包括T
0时刻以及保护组的信息。
应理解,主控制设备103在T
0时刻之前,生成第三指令并将其发送给主集群101,以便主集群101能够在即将到来的T
0时刻对业务数据进行快照处理。
S904:备份管理装置1011在接收到第三指令后,创建多个进程,并利用该多个进程分别访问保护组的信息指示与第一业务相关的多个数据集所对应的HDFS组件、Hive组件以及SparkSQL组件。
S905:备份管理装置1011利用多个进程在T
0时刻对这些组件存储的与第一业务相关的多个数据集进行快照。
作为一种实现示例,备份管理装置1011可以创建进程1、进程2以及进程3。其中,进程1可以负责访问HDFS组件,并在T
0时刻对由HDFS组件存储的与第一业务相关的HDFS目录(也即与第一业务相关的数据集)进行快照,并且,HDFS组件存储的第一业务的数据通过文件形式保存在该HDFS目录中。进程2可以负责访问Hive组件,并可以在T
0时刻通过数据抽取命令从数据库(data base,DB)中获取由该Hive组件存储的业务数据的元数据,从而根据该元数据对指示第一业务的数据实际存储位置的HDFS目录进行快照。通常情况下,由Hive组件存储 的结构化数据,是通过文件格式保存在相应的HDFS目录中。进程3可以负责访问SparkSQL组件,并可以在T
0时刻通过数据抽取命令从数据库中获取由该SparkSQL组件存储的业务数据的元数据,从而根据该元数据对指示第一业务的数据实际存储位置的HDFS目录进行快照。并且,由SparkSQL组件存储的结构化数据,也是通过文件格式保存在相应的HDFS目录中。
在其它实施例中,备份管理装置1011也可以是利用至少一个执行器对各组件存储的业务数据进行快照等。其中,本实施例中所描述执行器的实现方式与前述实施例中执行器的实现方式类似。
S906:在完成数据快照后,主控制设备103向备集群102上的备份管理装置1021下发第二指令,以指示备集群102复制主集群101中的与第一业务相关的多个数据集在第一时刻的快照对应的数据。
实际应用时,主集群101上备份管理装置1011在利用多个进程完成对于通过HDFS组件、Hive组件以及SparkSQL组件存储的业务数据的快照后,可以向主控制设备103返回快照成功的通知,从而主控制设备103在确定快照结束后,通过下发第二指令指示备集群102将主集群101上的业务数据备份至备集群102。
示例性地,第二指令中可以包括主集群101中对由HDFS组件、Hive组件以及SparkSQL组件存储的与第一业务相关的数据集对应的快照的指示信息,以便于备份管理装置1021确定对哪些数据集进行备份。
S907:备份管理装置1021根据第一业务的保护组的信息启动多个复制任务,每个复制任务用于实现对一个组件存储的与第一业务相关的数据集进行备份。
S908:备份管理装置1021通过至少一个执行器执行该多个复制任务,根据主集群101上T
0时刻的快照,将由各个组件存储的与第一业务相关的数据集备份至备集群102。
作为一种示例,假设备集群102上运行有3个执行器,分别为执行器1、执行器2以及执行器3。其中,执行器1用于执行复制任务1,通过访问主集群101获得在T
0时刻针对HDFS组件存储的与第一业务相关的数据集(如HDFS目录)的快照,从而可以根据该快照将HDFS组件存储的与第一业务相关的数据集备份至备集群102上的HDFS组件对应的存储区域中。类似的,执行器2用于执行复制任务2,根据在T
0时刻针对Hive组件存储的与第一业务相关的数据集的快照,实现将通过Hive组件存储的与第一业务相关的数据集至备集群102上的Hive组件对应的存储区域中;执行器3用于执行复制任务3,根据在T
0时刻针对SparkSQL组件中与第一业务相关的数据集的快照,实现将通过SparkSQL组件存储的与第一业务相关的数据集备份至备集群102上的SparkSQL组件对应的存储区域中。
其中,在将Hive组件以及SparkSQL组件对应的第一业务的数据集备份至备集群102时,可以将该业务数据在备集群102上的元数据保存至备集群102上的数据库中,以便于后续在备集群102上根据数据库中的元数据查询该第一业务的数据。
此时,备份至备集群102上的业务数据,均为主集群101在T
0时刻所存储的第一业务相关的多个数据集,从而实现备集群102上的第一业务的数据在时间维度保持一致。
S909:备份管理装置1021通过至少一个执行器对备份至备集群102上的第一业务的数据进行快照。
在将业务数据由主集群101备份至备集群102后,备份管理装置1021还可以利用该执行器对备份的第一业务的数据进行快照。此时,在备集群102上的快照,与主集群101在T
0时刻的业务数据一致。
值得注意的是,本实施例中,是以备集群102主动从主集群101上备份业务数据为例进行 示例性说明,实际应用时,也可以是由主集群101主动将业务数据备份至备集群102上,例如,主控制设备103可以向主集群101发送第一指令,以指示主集群101将由HDFS组件、Hive组件以及SparkSQL组件存储的与第一业务相关的数据集备份至备集群102。此时,主集群101可以在完成快照后,可以通过相应的执行器,根据T
0时刻的快照将第一业务的数据备份至备集群102。
进一步的,本实施例中,不仅主集群101上第一业务的数据可以被备份至备集群102,主集群101上的用户数据也可以被备份至备集群102。例如,主控制设备103可以向所述主集群101发送第四指令,以便基于该第四指令指示主集群101将用户数据同步至备集群102。或者,主控制设备103可以获取主集群101和备集群102中存储的用户数据,并根据主集群101中存储的用户数据对备集群102中存储的用户数据进行调整,以使得主集群101与备集群102中存储的用户数据一致。这样,当备集群102接管主集群101上的业务时,可以根据备份至备集群102上的用户数据为用户提供相应的业务服务,从而无需由运维人员在备集群102上进行人工配置。如此,不仅可以降低于运维人员的运维成本,而且,也能有效减小数据备份系统300的恢复时间目标。为此,本实施例还可以包括下述步骤S910:
S910:备集群102将主集群101上的用户数据备份至备集群102;或者,主集群101主动将用户数据备份至备集群102;或者,主控制设备103根据主集群101中存储的用户数据对备集群102中存储的用户数据进行调整。
示例性地,主集群101上的用户数据,例如可以包括在主集群101上创建的用户(包括用户106)的标识、租户的标识以及为用户申请的权限中的至少一种。
在主集群101上的业务数据被备份至备集群102后,若主集群101发生故障,则备集群102可以对主集群101当前运行的业务进行接管,并利用预先备份的业务数据,继续为用户提供业务数据的读写服务,以此保证数据备份系统300对于存储用户的业务数据的可靠性。
值得注意的是,为了描述简洁,本实施例中重点是对步骤S902至步骤S910的执行操作进行介绍,针对各个步骤的具体实现方式,具体可以参见前述实施例中的相关之处描述,本实施例中对此不在进行赘述。并且,本实施例中,用户106可以通过应用服务器105,实现对于主集群101或者备集群102的访问,具体可以是在主集群101故障之前,用户通过应用服务器105上的主客户端1051访问主集群101,而在主集群101故障之后,应用服务器105可以自动切换访问集群的客户端,从而用户通过应用服务器105上的备客户端1052访问备集群102。其具体实现过程,可以参见前述实施例的相关之处描述,在此不再重复赘述。
上述实施例中,是以在T
0时刻对主集群101上的业务数据进行备份为例进行示例性说明,实际应用时,主集群101与备集群102之间可以周期性的进行数据备份。比如,用户106在策略配置界面上配置起始备份的时刻为T
0时刻的同时,还配置了主集群101与备集群102之间的备份周期,从而在执行第一次数据备份后,经过备份周期的时长时执行第二次数据备份过程。因此,上述实施例中的T
0时刻也即周期备份的起始时刻。此时,备集群102每次可以按照上述实施例所描述的类似过程将主集群101上的所有业务数据均备份至备集群102。而在其它实施例中,备集群102在第二次以及后续的备份过程中,可以仅将主集群101上的增量数据备份至备集群102。下面,以主集群101与备集群102之间进行第二轮数据备份为例进行说明,其中,第二轮备份的业务数据为主集群101在T
0~T
1时间段内新增至主集群101上的业务数据(以下称之为增量数据)。参见图10,示出了本申请实施例中又一种数据备份方法的流程示意图,该方法具体可以包括:
S1001:主控制设备103在T
1时刻之前,向主集群101上备份管理装置1011发送第五指令,该第五指令中包括T
1时刻以及第一业务的保护组信息。
S1002:备份管理装置1011在接收到第五指令后,创建多个进程(或者沿用第一轮数据备份时已创建的多个进程),并利用该多个进程分别访问保护组的信息指示的数据集所对应的HDFS组件、Hive组件以及SparkSQL组件。
S1003:备份管理装置1011利用多个进程在T
1时刻对这些组件存储的与第一业务相关的数据集进行快照。
作为一种实现示例,备份管理装置1011所创建的进程1,可以负责访问HDFS组件,并在T
1时刻对由HDFS组件存储的与第一业务相关的HDFS目录下的数据集进行快照。其中,HDFS组件存储的第一业务的数据通过文件格式保存在该HDFS目录中。进程2可以负责访问Hive组件,并可以在T
1时刻通过数据抽取命令从数据库中获取该Hive组件存储的与第一业务相关数据集的元数据,从而根据该元数据对指示该数据集实际存储位置的HDFS目录进行快照。其中,Hive组件存储的结构化数据,是通过文件格式保存在相应的HDFS目录中。进程3可以负责访问SparkSQL组件,并可以在T
1时刻通过数据抽取命令从数据库中获取通过该SparkSQL组件存储的与第一业务相关数据集的元数据,从而根据该元数据对指示该数据集的实际存储位置的HDFS目录进行快照。其中,SparkSQL组件存储的结构化数据,也是通过文件格式保存在相应的HDFS目录中。
S1004:在完成数据快照后,主控制设备103向备集群102上的备份管理装置1021下发第六指令,以指示备集群102将T
1时刻的主集群101中的与第一业务的数据集备份至备集群102。
S1005:备份管理装置1021根据第一业务的保护组的信息启动多个复制任务,每个复制任务用于实现对一个组件存储的与第一业务相关的数据集进行备份。
S1006:备份管理装置1021通过至少一个执行器执行该多个复制任务,根据主集群101上T
0时刻的快照以及T
1时刻的快照,将各个组件存储的第一业务的增量数据备份至备集群102。
作为一种示例,假设备集群102上运行有3个执行器,分别为执行器1、执行器2以及执行器3。其中,执行器1用于执行复制任务1,通过访问主集群101获得在T
0时刻针对HDFS组件的第一业务对应的快照以及在T
1时刻针对HDFS组件的第一业务对应的快照,从而可以根据T
0时刻以及T
1时刻的快照,确定HDFS组件在T
0~T
1时间段内存储的第一业务的增量数据,并将该增量数据备份至备集群102上的HDFS组件对应的存储区域中。类似的,执行器2用于执行复制任务2,根据在T
0时刻以及T
1时刻针对Hive组件的第一业务对应的快照,确定Hive组件在T
0~T
1时间段内存储的第一业务的增量数据,并将该增量数据备份至备集群102上的Hive组件对应的存储区域中;执行器3用于执行复制任务3,根据在T
0时刻以及T
1时刻针对SparkSQL组件的第一业务对应的快照,确定SparkSQL组件在T
0~T
1时间段内存储的第一业务的增量数据,并将该增量数据备份至备集群102上的SparkSQL组件对应的存储区域中。
其中,在将Hive组件以及SparkSQL组件对应的增量数据备份至备集群102时,可以将该增量数据在备集群102上的元数据保存至备集群102上的数据库中,以便于后续在备集群102上根据数据库中的元数据查询相应的第一业务的数据。
此时,备份至备集群102上的业务数据,为主集群101在T
0时刻的第一业务的数据以及在T
0~T
1时间段内该第一业务新增的业务数据,这也就是主集群101上在T
1时刻的业务数据。
S1007:备份管理装置1021通过至少一个执行器对备份至备集群102上的第一业务的增量数据进行快照。
如此,后续每次备份业务数据时,主集群101以及备集群102之间可以仅传输第一业务的 增量数据,而可以无需将主集群101上的所有业务数据均传输至备集群102,从而可以有效减少主集群101与备集群102之间的业务数据的传输量,这在提高备份效率的同时,也能减少备份业务数据所需的资源消耗。
值得注意的是,为了描述简洁,本实施例中重点是对步骤S1001至步骤S1007的执行操作进行介绍,针对各个步骤的具体实现方式,具体可以参见前述实施例中的相关之处描述,本实施例中对此不在进行赘述。
以上结合图1至图10对本申请实施例提供的数据备份系统以及数据备份方法进行介绍,接下来结合附图对本申请实施例提供的用于执行上述数据备份方法的设备进行介绍。
参见图11,示出了本申请实施例提供的一种控制设备的结构示意图。其中,图11所示的控制设备1100可以用于实现上述各实施例中主控制设备103所执行的数据备份方法,图11所述的控制设备1100位于数据备份系统,如上述图5所示的数据备份系统300等,该数据备份系统还包括主集群以及备集群,该控制设备1100包括:
控制模块1101,用于根据第一数据备份策略控制所述主集群或所述备集群将第一时刻的所述主集群中与第一业务相关的多个数据集备份至所述备集群,其中,所述第一数据备份策略包括与所述第一业务相关的所述多个数据集的信息和所述第一时刻。
在一种可能的实施方式中,所述控制模块1101,具体用于:
向所述主集群发送第一指令,指示所述主集群将与所述第一业务相关的所述多个数据集在所述第一时刻的快照对应的数据发送至所述备集群,或者,向所述备集群发送第二指令,指示所述备集群从所述主集群复制所述主集群中与所述第一业务相关的所述多个数据集在所述第一时刻的快照对应的数据。
在一种可能的实施方式中,所述控制设备1100还包括:
通信模块1102,用于向在所述控制设备向所述主集群发送第一指令,或者,所述控制设备向所述备集群发送所述第二指令之前,向所述主集群发送第三指令,所述第三指令包括与所述第一业务相关的所述多个数据集的信息和所述第一时刻,所述第三指令用于指示所述主集群获取在所述第一时刻的与所述第一业务相关的所述多个数据集的快照。
在一种可能的实施方式中,所述控制设备1100还包括:
通信模块1102,用于向所述主集群发送第四指令,所述第四指令指示所述主集群将用户数据同步至所述备集群;
或者,所述控制模块1101,还用于获取所述主集群和所述备集群中存储的用户数据,并根据所述主集群中存储的用户数据对所述备集群中存储的用户数据进行调整。
在一种可能的实施方式中,所述控制设备1100还包括配置模块1103,用于根据用户输入的与所述第一业务相关的多个数据集的信息和所述第一时刻,为所述第一业务配置所述第一数据备份策略。
在一种可能的实施方式中,所述控制设备1100还包括配置模块1103,用于为第二业务配置第二数据备份策略,所述第二数据备份策略包括与所述第二业务相关的多个数据集的信息和第二时刻;
所述控制模块1101,还用于根据所述第二数据备份策略控制所述主集群或所述备集群将所述第二时刻的所述主集群中与所述第二业务相关的所述多个数据集备份至所述备集群。
在一种可能的实施方式中,与所述第一业务相关的所述多个数据集包括由所述主集群中的第一组件处理或存储的数据集和由所述主集群中的第二组件处理或存储的数据集。
在一种可能的实施方式中,所述控制设备1100包括主客户端以及备客户端,所述主客户端用于检测所述主集群的第一状态信息,所述备客户端用于检测所述备集群的第二状态信息,所述控制设备1100还包括:
通信模块1102,用于获取所述主客户端检测得到的第一状态信息以及所述备客户端检测得到的第二状态信息;
确定模块1104,用于当所述第一状态信息指示所述主集群为备身份或集群失效,且所述第二状态信息指示所述备集群为主身份时,确定所述备客户端为应用访问的客户端。
在一种可能的实施方式中,所述控制设备1100还包括提示模块1105以及调整模块1106;
所述提示模块1105,用于向用户提示所述主集群故障的信息;
所述调整模块1106,用于响应于所述用户针对所述备集群的身份调整操作,将所述备集群的身份由备身份调整成主身份。
在一种可能的实施方式中,所述控制设备1100与所述主集群隔离部署。
在一种可能的实施方式中,所述控制设备1100、所述主集群和所述备集群中设置有相同的时钟源。
在一种可能的实施方式中,所述主集群和/或所述备集群包括基于hadoop架构构建的集群。
根据本申请实施例的控制设备1100可对应于执行本申请实施例中描述的方法,并且控制设备1100的各个模块的上述和其它操作和/或功能分别为了实现上述实施例中主控制设备103所执行的相应流程,为了简洁,在此不再赘述。
参见图12,示出了本申请实施例提供的一种主集群的结构示意图。其中,图12所示的主集群1200可以用于实现上述各实施例中主集群101所执行的数据备份方法,图12所述的主集群1200位于数据备份系统,如上述图5所示的数据备份系统300等,该数据备份系统还包括备集群以及控制设备,该主集群1200包括:
通信模块1201,用于获取控制设备下发的指令,其中,指令中包括与第一业务相关的多个数据集的信息和第一时刻;
备份模块1202,用于根据指令,将第一时刻的主集群中与第一业务相关的多个数据集备份至备集群。
在一种可能的实施方式中,备份模块1202,具体用于:
根据与第一业务相关的多个数据集的信息和第一时刻,获取主集群中与第一业务相关的多个数据集在第一时刻的快照;
根据快照,将快照对应的数据发送至备集群。
在一种可能的实施方式中,备份模块1202,还用于将用户数据同步至备集群。
在一种可能的实施方式中,主集群和/或备集群包括基于hadoop架构构建的集群。
根据本申请实施例的主集群1200可对应于执行本申请实施例中描述的方法,并且主集群1200的各个模块的上述和其它操作和/或功能分别为了实现上述实施例中主集群101所执行的相应流程,为了简洁,在此不再赘述。
图13提供了一种控制设备。如图13所示,控制设备1300具体可以用于实现上述图11所示的控制设备1100的功能。
控制设备1300包括总线1301、处理器1302和存储器1303。处理器1302、存储器1303之间 通过总线1301通信。
总线1301可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图13中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
处理器1302可以为中央处理器(central processing unit,CPU)、图形处理器(graphics processing unit,GPU)、微处理器(micro processor,MP)或者数字信号处理器(digital signal processor,DSP)、神经网络处理器(neural network processing unit,NPU)等处理器中的任意一种或多种。
存储器1303可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM)。存储器1303还可以包括非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM),快闪存储器,机械硬盘(hard drive drive,HDD)或固态硬盘(solid state drive,SSD)。
存储器1303中存储有可执行的程序代码,处理器1302执行该可执行的程序代码以执行前述实施例中主控制设备103所执行的数据备份方法。
图14提供了一种主集群。如图14所示,主集群1400具体可以用于实现上述图12所示的主集群1200的功能。
主集群1400包括至少一个处理器以及至少一个存储器,其中,该至少一个处理器与至少一个存储器可以位于一个或者多个计算设备中。示例性地,本实施例中以至少一个存储器以及至少一个存储器位于多个计算设备为例进行说明。其中,每个计算设备可以包括总线1401、处理器1402和存储器1403。处理器1402、存储器1403之间通过总线1401通信。
总线1401可以是PCI总线或EISA总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图14中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
处理器1402可以为CPU、GPU、MP或者DSP、NPU等处理器中的任意一种或多种。
存储器1403可以包括易失性存储器(volatile memory),例如RAM。存储器1403还可以包括非易失性存储器(non-volatile memory),例如ROM,快闪存储器,HDD或SSD。
每个计算设备中的存储器1403可以存储有可执行的程序代码,并且,各个计算设备中的处理器1402执行该可执行的程序代码后,使得主集群1400执行前述实施例中主集群101所执行的数据备份方法。
本申请实施例还提供了一种计算机可读存储介质。所述计算机可读存储介质可以是计算设备能够存储的任何可用介质或者是包含一个或多个可用介质的数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘)等。该计算机可读存储介质包括指令,所述指令指示计算设备执行上述主控制设备103或主集群101所执行的数据备份方法。
本申请实施例还提供了一种计算机程序产品。所述计算机程序产品包括一个或多个计算机指令。在计算设备上加载和执行所述计算机指令时,全部或部分地产生按照本申请实施例所述的流程或功能。
所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机或数据中心进行传输。
所述计算机程序产品可以为一个软件安装包,在需要使用前述对象识别方法的任一方法的情况下,可以下载该计算机程序产品并在计算设备上执行该计算机程序产品。
上述各个附图对应的流程或结构的描述各有侧重,某个流程或结构中没有详述的部分,可以参见其他流程或结构的相关描述。
Claims (32)
- 一种数据备份方法,其特征在于,所述方法应用于数据备份系统,所述数据备份系统包括主集群、备集群以及控制设备,所述方法包括:所述控制设备根据第一数据备份策略控制所述主集群或所述备集群将第一时刻的所述主集群中与第一业务相关的多个数据集备份至所述备集群,其中,所述第一数据备份策略包括与所述第一业务相关的所述多个数据集的信息和所述第一时刻。
- 根据权利要求1所述的方法,其特征在于,所述控制设备根据第一数据备份策略控制所述主集群或所述备集群将第一时刻的所述主集群中与第一业务相关的多个数据集备份至所述备集群,包括:所述控制设备向所述主集群发送第一指令,指示所述主集群将与所述第一业务相关的所述多个数据集在所述第一时刻的快照对应的数据发送至所述备集群;或者,所述控制设备向所述备集群发送第二指令,指示所述备集群从所述主集群复制所述主集群中与所述第一业务相关的所述多个数据集在所述第一时刻的快照对应的数据。
- 根据权利要求2所述的方法,其特征在于,在所述控制设备向所述主集群发送第一指令,或者,所述控制设备向所述备集群发送所述第二指令之前,所述方法还包括:所述控制设备向所述主集群发送第三指令,所述第三指令包括与所述第一业务相关的所述多个数据集的信息和所述第一时刻,所述第三指令用于指示所述主集群获取在所述第一时刻的与所述第一业务相关的所述多个数据集的快照。
- 根据权利要求1-3任一项所述的方法,其特征在于,所述方法还包括:所述控制设备向所述主集群发送第四指令,所述第四指令指示所述主集群将用户数据同步至所述备集群;或者,所述控制设备获取所述主集群和所述备集群中存储的用户数据,所述控制设备根据所述主集群中存储的用户数据对所述备集群中存储的用户数据进行调整。
- 根据权利要求1-4任一项所述的方法,其特征在于,所述方法还包括:所述控制设备根据用户输入的与所述第一业务相关的多个数据集的信息和所述第一时刻,为所述第一业务配置所述第一数据备份策略。
- 根据权利要求1-5任一项所述的方法,其特征在于,所述方法还包括:所述控制设备为第二业务配置第二数据备份策略,所述第二数据备份策略包括与所述第二业务相关的多个数据集的信息和第二时刻;所述控制设备根据所述第二数据备份策略控制所述主集群或所述备集群将所述第二时刻的所述主集群中与所述第二业务相关的所述多个数据集备份至所述备集群。
- 根据权利要求1-6任一项所述的方法,其特征在于,与所述第一业务相关的所述多个数据集包括由所述主集群中的第一组件处理或存储的数据集和由所述主集群中的第二组件处理或存储的数据集。
- 根据权利要求1-7任一项所述的方法,其特征在于,所述控制设备包括主客户端以及备客户端,所述主客户端用于检测所述主集群的第一状态信息,所述备客户端用于检测所述备集群的第二状态信息,所述方法还包括:所述控制设备获取所述主客户端检测得到的第一状态信息以及所述备客户端检测得到的第二状态信息;当所述第一状态信息指示所述主集群为备身份或集群失效,且所述第二状态信息指 示所述备集群为主身份时,所述控制设备确定所述备客户端为应用访问的客户端。
- 根据权利要求8所述的方法,其特征在于,所述方法还包括:所述控制设备向用户提示所述主集群故障的信息;所述控制设备响应于所述用户针对所述备集群的身份调整操作,将所述备集群的身份由备身份调整成主身份。
- 根据权利要求1-9任一项所述的方法,其特征在于,所述控制设备与所述主集群隔离部署。
- 根据权利要求1-10任一项所述的方法,其特征在于,所述控制设备、所述主集群和所述备集群中设置有相同的时钟源。
- 根据权利要求1-11任一项所述的方法,其特征在于,所述主集群和/或所述备集群包括基于hadoop架构构建的集群。
- 一种数据备份方法,其特征在于,所述方法应用于数据备份系统,所述数据备份系统包括主集群、备集群和控制设备,所述方法包括:所述主集群获取所述控制设备下发的指令,其中,所述指令中包括与第一业务相关的多个数据集的信息和第一时刻;所述主集群根据所述指令,将所述第一时刻的所述主集群中与所述第一业务相关的所述多个数据集备份至所述备集群。
- 根据权利要求13所述的方法,其特征在于,所述主集群根据所述指令,将所述第一时刻的所述主集群中与所述第一业务相关的所述多个数据集备份至所述备集群,具体包括:所述主集群根据与所述第一业务相关的所述多个数据集的信息和所述第一时刻,获取所述主集群中与所述第一业务相关的所述多个数据集在所述第一时刻的快照;所述主集群根据所述快照,将所述快照对应的数据发送至所述备集群。
- 根据权利要求13或14所述的方法,其特征在于,所述方法还包括:所述主集群将用户数据同步至所述备集群。
- 根据权利要求13-15任一项所述的方法,其特征在于,所述主集群和/或所述备集群包括基于hadoop架构构建的集群。
- 一种控制设备,其特征在于,所述控制设备位于数据备份系统,所述数据备份系统还包括主集群以及备集群,所述控制设备包括:控制模块,用于根据第一数据备份策略控制所述主集群或所述备集群将第一时刻的所述主集群中与第一业务相关的多个数据集备份至所述备集群,其中,所述第一数据备份策略包括与所述第一业务相关的所述多个数据集的信息和所述第一时刻。
- 根据权利要求17所述的控制设备,其特征在于,所述控制模块,具体用于:向所述主集群发送第一指令,指示所述主集群将与所述第一业务相关的所述多个数据集在所述第一时刻的快照对应的数据发送至所述备集群;或者,向所述备集群发送第二指令,指示所述备集群从所述主集群复制所述主集群中与所述第一业务相关的所述多个数据集在所述第一时刻的快照对应的数据。
- 根据权利要求18所述的控制设备,其特征在于,所述控制设备还包括:通信模块,用于在向在所述控制设备向所述主集群发送第一指令或所述控制设备向所述备集群发送所述第二指令之前,向所述主集群发送第三指令,所述第三指令包括与所述第一业务相关的所述多个数据集的信息和所述第一时刻,所述第三指令用于指示所 述主集群获取在所述第一时刻的与所述第一业务相关的所述多个数据集的快照。
- 根据权利要求17-19任一项所述的控制设备,其特征在于,所述控制设备还包括:通信模块,用于向所述主集群发送第四指令,所述第四指令指示所述主集群将用户数据同步至所述备集群;或者,所述控制模块,还用于获取所述主集群和所述备集群中存储的用户数据,并根据所述主集群中存储的用户数据对所述备集群中存储的用户数据进行调整。
- 根据权利要求17-20所述的控制设备,其特征在于,所述控制设备还包括配置模块,用于根据用户输入的与所述第一业务相关的多个数据集的信息和所述第一时刻,为所述第一业务配置所述第一数据备份策略。
- 根据权利要求17-21任一项所述的控制设备,其特征在于,所述控制设备还包括配置模块,用于为第二业务配置第二数据备份策略,所述第二数据备份策略包括与所述第二业务相关的多个数据集的信息和第二时刻;所述控制模块,还用于根据所述第二数据备份策略控制所述主集群或所述备集群将所述第二时刻的所述主集群中与所述第二业务相关的所述多个数据集备份至所述备集群。
- 根据权利要求17-22所述的控制设备,其特征在于,与所述第一业务相关的所述多个数据集包括由所述主集群中的第一组件处理或存储的数据集和由所述主集群中的第二组件处理或存储的数据集。
- 根据权利要求17-23任一项所述的控制设备,其特征在于,所述控制设备包括主客户端以及备客户端,所述主客户端用于检测所述主集群的第一状态信息,所述备客户端用于检测所述备集群的第二状态信息,所述控制设备还包括:通信模块,用于获取所述主客户端检测得到的第一状态信息以及所述备客户端检测得到的第二状态信息;确定模块,用于当所述第一状态信息指示所述主集群为备身份或集群失效,且所述第二状态信息指示所述备集群为主身份时,确定所述备客户端为应用访问的客户端。
- 根据权利要求24任一项所述的控制设备,其特征在于,所述控制设备还包括提示模块以及调整模块;所述提示模块,用于向用户提示所述主集群故障的信息;所述调整模块,用于响应于所述用户针对所述备集群的身份调整操作,将所述备集群的身份由备身份调整成主身份。
- 根据权利要求17-25任一项所述的控制设备,其特征在于,所述控制设备与所述主集群隔离部署。
- 根据权利要求17-26任一项所述的控制设备,其特征在于,所述控制设备、所述主集群和所述备集群中设置有相同的时钟源。
- 根据权利要求17-27任一项所述的控制设备,其特征在于,所述主集群和/或所述备集群包括基于hadoop架构构建的集群。
- 一种数据备份系统,其特征在于,所述数据备份系统包括控制设备、主集群和备集群;所述控制设备用于执行前述方法权利要求1-12任一项所述的方法;所述主集群用于执行前述方法权利要求13-16任一项所述的方法;所述备集群用于获取并存储从所述主集群备份的数据集。
- 一种控制设备,其特征在于,所述计算设备包括处理器、存储器;所述处理器用于执行所述存储器中存储的指令,以使所述计算设备执行如权利要求1至12任一项所述的方法。
- 一种计算机可读存储介质,其特征在于,包括指令,当所述指令在计算设备运行时,使得所述计算设备执行如权利要求1至12中任一项所述的方法。
- 一种包含指令的计算机程序产品,当其在计算设备上运行时,使得所述计算设备执行如权利要求1至12中任一项所述的方法。
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP22794201.8A EP4318243A4 (en) | 2021-04-26 | 2022-01-17 | Data backup method and system, and related device |
| US18/494,440 US20240054054A1 (en) | 2021-04-26 | 2023-10-25 | Data Backup Method and System, and Related Device |
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202110454477.7 | 2021-04-26 | ||
| CN202110454477 | 2021-04-26 | ||
| CN202110961109.1A CN115248746A (zh) | 2021-04-26 | 2021-08-20 | 数据备份方法、系统及相关设备 |
| CN202110961109.1 | 2021-08-20 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US18/494,440 Continuation US20240054054A1 (en) | 2021-04-26 | 2023-10-25 | Data Backup Method and System, and Related Device |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2022227719A1 true WO2022227719A1 (zh) | 2022-11-03 |
Family
ID=83695944
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2022/072427 Ceased WO2022227719A1 (zh) | 2021-04-26 | 2022-01-17 | 数据备份方法、系统及相关设备 |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US20240054054A1 (zh) |
| EP (1) | EP4318243A4 (zh) |
| CN (1) | CN115248746A (zh) |
| WO (1) | WO2022227719A1 (zh) |
Families Citing this family (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115794487A (zh) * | 2022-11-14 | 2023-03-14 | 超聚变数字技术有限公司 | 一种业务恢复方法及设备 |
| WO2025022435A1 (en) * | 2023-07-21 | 2025-01-30 | Jio Platforms Limited | Method and system for providing data back up during lack of connectivity |
| CN116955015B (zh) * | 2023-09-19 | 2024-01-23 | 恒生电子股份有限公司 | 基于数据存储服务的数据备份系统及方法 |
| CN118939480B (zh) * | 2024-08-20 | 2025-12-30 | 武汉吧哒科技股份有限公司 | 服务恢复方法、装置、电子设备及存储介质 |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102917072A (zh) * | 2012-10-31 | 2013-02-06 | 北京奇虎科技有限公司 | 用于数据服务器集群之间进行数据迁移的设备、系统及方法 |
| CN102982085A (zh) * | 2012-10-31 | 2013-03-20 | 北京奇虎科技有限公司 | 数据迁移系统和方法 |
| US20190050306A1 (en) * | 2017-08-11 | 2019-02-14 | T-Mobile Usa, Inc. | Data redundancy and allocation system |
| CN109739690A (zh) * | 2018-12-29 | 2019-05-10 | 平安科技(深圳)有限公司 | 备份方法及相关产品 |
| CN112380067A (zh) * | 2020-11-30 | 2021-02-19 | 四川大学华西医院 | 一种Hadoop环境下基于元数据的大数据备份系统及方法 |
| CN112527567A (zh) * | 2020-12-24 | 2021-03-19 | 北京百度网讯科技有限公司 | 系统容灾方法、装置、设备以及存储介质 |
Family Cites Families (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7475099B2 (en) * | 2004-02-25 | 2009-01-06 | International Business Machines Corporation | Predictive algorithm for load balancing data transfers across components |
| US7620647B2 (en) * | 2006-09-15 | 2009-11-17 | Initiate Systems, Inc. | Hierarchy global management system and user interface |
| JP2008117342A (ja) * | 2006-11-08 | 2008-05-22 | Hitachi Ltd | ストレージシステムおよびリモートコピーを制御するためのコントローラ |
| US9275060B1 (en) * | 2012-01-27 | 2016-03-01 | Symantec Corporation | Method and system for using high availability attributes to define data protection plans |
| US9582219B2 (en) * | 2013-03-12 | 2017-02-28 | Netapp, Inc. | Technique for rapidly converting between storage representations in a virtualized computing environment |
| US10089187B1 (en) * | 2016-03-29 | 2018-10-02 | EMC IP Holding Company LLC | Scalable cloud backup |
| CN110324375B (zh) * | 2018-03-29 | 2020-12-04 | 华为技术有限公司 | 一种信息备份方法及相关设备 |
| US11080078B2 (en) * | 2018-09-19 | 2021-08-03 | Microsoft Technology Licensing, Llc | Processing files via edge computing device |
| US11734122B2 (en) * | 2020-09-24 | 2023-08-22 | EMC IP Holding Company LLC | Backup task processing in a data storage system |
-
2021
- 2021-08-20 CN CN202110961109.1A patent/CN115248746A/zh active Pending
-
2022
- 2022-01-17 EP EP22794201.8A patent/EP4318243A4/en active Pending
- 2022-01-17 WO PCT/CN2022/072427 patent/WO2022227719A1/zh not_active Ceased
-
2023
- 2023-10-25 US US18/494,440 patent/US20240054054A1/en active Pending
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102917072A (zh) * | 2012-10-31 | 2013-02-06 | 北京奇虎科技有限公司 | 用于数据服务器集群之间进行数据迁移的设备、系统及方法 |
| CN102982085A (zh) * | 2012-10-31 | 2013-03-20 | 北京奇虎科技有限公司 | 数据迁移系统和方法 |
| US20190050306A1 (en) * | 2017-08-11 | 2019-02-14 | T-Mobile Usa, Inc. | Data redundancy and allocation system |
| CN109739690A (zh) * | 2018-12-29 | 2019-05-10 | 平安科技(深圳)有限公司 | 备份方法及相关产品 |
| CN112380067A (zh) * | 2020-11-30 | 2021-02-19 | 四川大学华西医院 | 一种Hadoop环境下基于元数据的大数据备份系统及方法 |
| CN112527567A (zh) * | 2020-12-24 | 2021-03-19 | 北京百度网讯科技有限公司 | 系统容灾方法、装置、设备以及存储介质 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4318243A4 * |
Also Published As
| Publication number | Publication date |
|---|---|
| EP4318243A1 (en) | 2024-02-07 |
| EP4318243A4 (en) | 2024-10-02 |
| US20240054054A1 (en) | 2024-02-15 |
| CN115248746A (zh) | 2022-10-28 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11907254B2 (en) | Provisioning and managing replicated data instances | |
| US11720456B2 (en) | Automatic configuration of a recovery service | |
| US11709743B2 (en) | Methods and systems for a non-disruptive automatic unplanned failover from a primary copy of data at a primary storage system to a mirror copy of the data at a cross-site secondary storage system | |
| US8533171B2 (en) | Method and system for restarting file lock services at an adoptive node during a network filesystem server migration or failover | |
| WO2022227719A1 (zh) | 数据备份方法、系统及相关设备 | |
| EP2718816B1 (en) | Recovery service location for a service | |
| US20220317897A1 (en) | Performing various operations at the granularity of a consistency group within a cross-site storage solution | |
| US12197291B2 (en) | Methods and multi-site systems to provide recovery point objective (RPO) protection, snapshot retention between secondary storage site and tertiary storage site, and automatically initiating realignment and reconfiguration of a protection configuration from the secondary storage site to the tertiary storage site upon primary storage site failure | |
| US20250117400A1 (en) | Life cycle management for standby databases | |
| CN116560904A (zh) | Nas数据备份容灾方法、系统、终端及存储介质 | |
| US20240028611A1 (en) | Granular Replica Healing for Distributed Databases | |
| KR20250104201A (ko) | 실시간 인스턴스 데이터 복제 기반의 장애 복구 시스템 및 그 방법 | |
| CN116996174A (zh) | 基于多可用区物联网平台容灾方法、装置、设备及介质 | |
| JP7450726B2 (ja) | ハイブリッドクラウド非同期データ同期 | |
| CN107404511A (zh) | 集群中服务器的替换方法及设备 | |
| US20260023662A1 (en) | Coordinated backup of failover databases across multiple datacenters | |
| CN116319832A (zh) | 用于ad域账号资产灾备的方法、终端及控制系统 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22794201 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2022794201 Country of ref document: EP |
|
| ENP | Entry into the national phase |
Ref document number: 2022794201 Country of ref document: EP Effective date: 20231025 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |