WO2024259890A1 - 数据备份方法、电子设备及计算机可读存储介质 - Google Patents
数据备份方法、电子设备及计算机可读存储介质 Download PDFInfo
- Publication number
- WO2024259890A1 WO2024259890A1 PCT/CN2023/133236 CN2023133236W WO2024259890A1 WO 2024259890 A1 WO2024259890 A1 WO 2024259890A1 CN 2023133236 W CN2023133236 W CN 2023133236W WO 2024259890 A1 WO2024259890 A1 WO 2024259890A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- buffer
- backup
- hardware acceleration
- buffers
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operations
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1448—Management of the data involved in backup or backup restore
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operations
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1456—Hardware arrangements for backup
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/172—Caching, prefetching or hoarding of files
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
- G06F16/184—Distributed file systems implemented as replicated file system
- G06F16/1844—Management specifically adapted to replicated file systems
Definitions
- the present application relates to the field of databases, and in particular to a data backup method, an electronic device, and a computer-readable storage medium.
- the embodiments of the present application disclose a data backup method, an electronic device, and a computer-readable storage medium, which can improve the physical backup efficiency of a database.
- the first aspect discloses a data backup method, which can be applied to electronic devices, modules in electronic devices (e.g., chips, central processing units, etc.), and logic modules or software (such as the backup tool described below) that can implement all or part of the functions of electronic devices.
- the electronic device may include a central processing unit, a memory, and a hardware acceleration device, and the memory may include a first buffer. The following description is based on the application to the electronic device as an example.
- the data backup method may include: reading the first data in the first data file, and storing the first data in the first buffer, wherein the first data file is stored in a data disk corresponding to the original database directory; compressing the first data in the first buffer by the hardware acceleration device to generate backup data corresponding to the first data; and writing the backup data to the data disk corresponding to the backup directory.
- the first data when performing a physical backup of a database, the first data may be first stored in a first buffer, and then the data may be compressed by a hardware acceleration device in units of the first buffer.
- the performance of the hardware acceleration device may be fully utilized, thereby improving the efficiency of data compression, and further improving the efficiency of the physical backup of the database.
- the overall power consumption of the electronic device may be reduced.
- the reading of the first data in the first data file and the storing of the first data in the first buffer include: opening the first data file; when the first buffer does not store or is not full of the first data, reading the first data in the first data file and storing the first data in the first buffer.
- the first buffer when the first buffer is not filled with or has not been fully filled with the first data, the first data in the first data file is read and the read first data is stored in the first buffer. In this way, it can be ensured that The first data stored in the first buffer will not be overwritten, thereby ensuring the integrity of the data and further ensuring the integrity of the backup data.
- compressing the first data in the first buffer by using the hardware acceleration device includes: when the first buffer is full of the first data, compressing the first data in the first buffer by using the hardware acceleration device.
- the data in the first buffer is compressed, which can maximize the performance of the hardware acceleration device, thereby increasing the speed of physical backup of the database.
- the method may further include: after the hardware acceleration device completes compressing the first data in the first buffer, changing the state of the first buffer to an idle state.
- a status mark can be set for the first buffer.
- the first buffer can be used to write the first data into the first buffer.
- the status mark of the first buffer is not idle, the first buffer cannot be used. In this way, it can be ensured that the electronic device can fully use the first buffer based on the status mark of the first buffer.
- the method further includes: reading remaining data from the first data file, and storing the remaining data in a second buffer, where the second buffer is any other buffer in the first buffer that does not store or is not fully filled with the first data; when the hardware acceleration device completes compressing the first data in a first buffer, compressing the data in the second buffer through the hardware acceleration device.
- storing the first data in the first buffer includes: verifying the first data; and if the verification is successful, storing the first data page in the first buffer.
- a check can be performed first, and if the check succeeds, the read first data is stored in the first buffer, so that the storage of wrong data can be avoided and the accuracy of the backup data can be ensured.
- the wrong data can be discarded through data check, and the data that needs to be compressed can be reduced, thereby saving the processing resources of the electronic device and reducing the overall power consumption of the electronic device.
- the method also includes: reading first data from multiple data files under the original database directory through multiple threads, and storing the first data in different data files read into multiple buffers where the first data is not stored or is not full in the first buffers; when the third buffer is full, compressing the data in the third buffer through the hardware acceleration device, and the third buffer is any full buffer among the multiple first buffers.
- data can be written to multiple buffers through multiple threads. In this way, it can be ensured that the hardware acceleration device of the electronic device always has a full buffer for compression, so that the performance of the hardware acceleration device can be fully utilized, and the efficiency of physical backup of the database can be further improved.
- the electronic device includes multiple hardware acceleration devices
- the method further includes: when there are multiple full buffers in the multiple first buffers, compressing the data in the multiple full buffers in parallel by processing the multiple hardware acceleration devices.
- the electronic device may include multiple hardware acceleration devices.
- the electronic device may Multiple hardware acceleration devices compress data in multiple buffers in parallel, which can further improve the efficiency of database physical backup.
- the number of threads is determined according to the first time, the second time, and the number of hardware acceleration devices included in the electronic device; the first time is the time required to read the first data from the original database directory to fill a first buffer zone, and the second time is the time required for a hardware acceleration device to complete compressing a full first buffer zone, and for the central processor to write the corresponding backup data into the data disk corresponding to the backup directory.
- the data of the thread to be started can be determined according to the first time, the second time, and the number of hardware acceleration devices, so that the number of threads can be matched with the processing capability of the hardware acceleration device included in the electronic device. While ensuring that the hardware acceleration device of the electronic device always has a full buffer for compression, the processor resources of the electronic device will not be excessively occupied.
- the number of the buffers is determined based on the first time, the second time, and the number of hardware acceleration devices included in the electronic device; the first time is the time required to read the first data from the original database directory to fill a first buffer, and the second time is the time required for a hardware acceleration device to complete compressing a full first buffer, and for the central processor to write the corresponding backup data to the data disk corresponding to the backup directory.
- the number of buffers that need to be set can be determined based on the first time, the second time, and the number of hardware acceleration devices. In this way, it can be ensured that the number of buffers is compatible with the number of threads and the processing capabilities of the hardware acceleration devices included in the electronic device, and the memory resources of the electronic device will not be excessively occupied.
- the method further includes: during the data backup process, dynamically adjusting the number of threads and/or the number of buffers according to the first time, the second time, and the number of hardware acceleration devices included in the electronic device.
- the first time and the second time can be variable, so the electronic device can dynamically adjust the number of buffers and the number of threads to improve the utilization efficiency of buffer resources and thread resources.
- the dynamic adjustment of the number of buffers and the number of threads can ensure that memory resources and processor resources are not excessively occupied, and the number of buffers and the number of threads are not insufficient, and the performance of the hardware acceleration device cannot be brought into play.
- the central processing unit includes the hardware acceleration device.
- the hardware acceleration device can be integrated into the central processing unit, so that the communication delay between the hardware acceleration device and the central processing unit can be reduced, thereby further improving the efficiency of physical backup of the database.
- a second aspect discloses an electronic device, the computing device comprising a processor and a memory, the processor calling a computer program stored in the memory to implement a data backup method as provided in the first aspect and any possible implementation of the first aspect.
- the third aspect discloses a computer-readable storage medium, on which a computer program or computer instructions are stored. When the computer program or computer instructions are executed, the data backup method disclosed in the above aspects is implemented.
- a fourth aspect discloses a chip, including a processor for executing a program stored in a memory.
- the chip executes the data backup method disclosed in the above aspects.
- the memory is located outside the chip.
- a sixth aspect discloses a computer program product, which includes a computer program code.
- the data backup method disclosed in the above aspects is executed.
- FIG1 is a schematic diagram of the structure of an electronic device disclosed in an embodiment of the present application.
- FIG2 is a schematic diagram of a scenario of physical backup of a database disclosed in an embodiment of the present application.
- FIG3 is a schematic diagram of a software structure of an electronic device disclosed in an embodiment of the present application.
- FIG4 is a schematic diagram of a scenario in which a backup tool disclosed in an embodiment of the present application performs physical backup of a database
- FIG5 is a flow chart of a database physical backup method disclosed in an embodiment of the present application.
- FIG6 is a flow chart of another database physical backup method disclosed in an embodiment of the present application.
- FIG. 7 is a flow chart of another database physical backup method disclosed in an embodiment of the present application.
- FIG8 is a flow chart of another database physical backup method disclosed in an embodiment of the present application.
- FIG9 is a flow chart of another database physical backup method disclosed in an embodiment of the present application.
- FIG. 10 is a flow chart of another database physical backup method disclosed in an embodiment of the present application.
- the embodiment of the present application discloses a database physical backup method, an electronic device and a computer-readable storage medium, which can improve the efficiency of database physical backup and reduce the hard disk space occupied.
- the technical solution in the embodiment of the present application will be clearly and completely described below in conjunction with the drawings in the embodiment of the present application.
- a database cluster is generally a database system composed of multiple machines. One or more databases can be created on a database cluster. A machine in a database cluster is called a node.
- Database clusters may include centralized database clusters, distributed database clusters, and the like.
- a centralized database cluster may include a host (master node) and multiple backup machines (slave nodes), and the data stored in the host and backup machines may be the same.
- a distributed database cluster may also include master nodes and slave nodes, each node may store a certain slice of data, and the data slices stored by multiple nodes may be spliced together into a complete data.
- the master node in the database cluster may manage and monitor each slave node, or the database cluster may include a separate management node for managing the master node or slave nodes, which is used to manage and monitor each slave node.
- the master node may issue data operation instructions, such as data backup instructions, to each slave node, instructing the slave node to perform data backup.
- Hardware acceleration technology is a technology that uses hardware devices to speed up processing, such as speeding up compression, decompression, decryption, encryption, random number generation, digital signature, video encoding and decoding, etc.
- dedicated hardware for specific processing can be designed.
- dedicated hardware for specific processing can be designed to improve data compression efficiency.
- these dedicated hardware for specific processing can be referred to as hardware acceleration devices.
- hardware acceleration devices can be used to assist the central processing unit (CPU) in its work to improve overall processing efficiency.
- computationally intensive tasks tasks with large computational requirements
- hardware acceleration devices can be assigned to hardware acceleration devices to reduce the pressure on the CPU and increase the overall processing speed.
- hardware acceleration devices are dedicated hardware designed for specific processing, they have lower power consumption than CPUs.
- RISC machine for hardware acceleration technology, it can include QAT (quick assist technology) under the x86 architecture and KAE (Kunpeng accelerator engine) under the arm (advanced RISC machine) architecture.
- QAT quick assist technology
- KAE Kerpeng accelerator engine
- arm advanced RISC machine
- the physical backup of the database refers to copying one or more copies of the data files under the original database directory and storing them in other directories respectively. These directories can be called backup directories.
- the backup set refers to a collection of database backup files, that is, one or more backup files stored in the backup directory.
- each backup file under the backup directory can be a compressed file
- the data files under the original database directory can correspond one to one with the backup files under the backup directory.
- the original database directory can be the directory of the database to be backed up.
- the physical backup of a database may include four steps: reading data, verifying, compressing, and writing to disk.
- the database usually uses fixed-size data pages (page) to store data, such as the size of each data page can be 8 kilobytes (KB).
- the backup tool backup software
- the electronic device can read 8KB data pages from the data file of the database in turn. After that, the electronic device can verify the read 8KB data pages in turn and calculate their checksums.
- the electronic device can use a compression algorithm implemented by software, such as lz4, zlib, zstd and other software compression algorithms, to compress the 8KB data pages after verification in turn. Finally, the electronic device can write the compressed data pages to the disk in turn and store them in the corresponding backup directory to generate a backup set.
- software such as lz4, zlib, zstd and other software compression algorithms
- each data page needs to be verified page by page, and each verified data page needs to be compressed page by page.
- verification and compression are both computationally intensive tasks, which require a large amount of CPU resources. Any processing in verification and compression may make the CPU work at full capacity. Therefore, in the case of limited CPU resources, verification and compression are generally performed serially.
- compression is a relatively time-consuming process, which usually accounts for 50% of the total time of physical backup, or even more.
- hardware acceleration technology can be used in database physical backup, that is, data page compression can be performed by hardware acceleration device.
- one or more buffers can be set, and the data pages to be compressed can be stored in the buffers first.
- the hardware acceleration device can compress batches of data pages in units of buffers, thereby ensuring that the performance of the hardware acceleration device is fully utilized.
- FIG. 1 is a schematic diagram of the structure of an electronic device disclosed in an embodiment of the present application.
- the electronic device 100 may be a mobile phone terminal, a tablet computer, a laptop computer, a desktop computer, a server, etc., which is not limited in the embodiment of the present application.
- the electronic device 100 may include: a central processing unit 101, a hardware acceleration device 102, a memory 103 and Hard disk 104
- the CPU 101 , the hardware acceleration device 102 , the memory 103 and the hard disk 104 may be connected to each other or connected to each other via a bus 105 .
- the hard disk 104 can be used to store the data files of the database.
- the central processing unit 101 can be used to read the data files in the database from the hard disk and perform operations such as verification.
- the memory 103 can be used to provide one or more buffers to temporarily store the data pages after verification.
- the hardware acceleration device 102 can compress the batch data pages in units of buffers to obtain the compressed backup data packets.
- the compressed backup data packets can be stored in the hard disk 104.
- the hardware acceleration device 102 may be dedicated hardware with a data compression function to assist the central processor 101 in improving the efficiency of data compression.
- the hardware acceleration device 102 may be a QAT acceleration card, a KAE acceleration card, and the like.
- hard disk 104 can be a hard disk drive (HDD) or a solid state drive (SSD), etc.
- HDD hard disk drive
- SSD solid state drive
- FIG1 only illustrates one central processing unit, one hardware acceleration device, one memory and one hard disk, but in actual situations, the electronic device may include two, three, four or more central processing units, hardware acceleration devices, memories and hard disks, and the embodiments of the present application are not limited to this.
- the central processor 101 and the hardware acceleration device 102 can be independently set or integrated together, such as integrating the hardware acceleration device 102 into the central processor 101.
- the electronic device 100 can be a node in a centralized database cluster, a node in a distributed database cluster, or other devices including a database.
- each node (including the master node and the slave node) can automatically back up the data files in the database, or the master node or the management node can uniformly issue a backup command, and each node backs up the data files in the database after receiving the backup command.
- the system architecture shown in Figure 1 is only an exemplary description and does not constitute a limitation.
- the electronic device 100 may also include more or fewer components, or different component configurations, etc., which are not limited here.
- the physical backup process of the present application is exemplarily described below in conjunction with FIG. 2 .
- Figure 2 is a schematic diagram of a scenario of physical backup of a database disclosed in an embodiment of the present application.
- the storage space corresponding to the original database directory of the database is the main data disk
- the storage space corresponding to the backup directory is the backup (copy) data disk.
- the backup directory may include a corresponding number of backup files for the number of data files included in the original database directory, that is, the data files under the original database directory may correspond one-to-one to the backup files under the backup directory.
- the organization of the backup files under the backup directory may be the same as the organization of the data files under the original database directory.
- the organization method here may include the adopted data structure, the relationship between the data, etc.
- the CPU can read the data pages in the data files under the original database directory from the main data disk in sequence, and can verify the read data pages.
- the CPU can store the verified data pages in an idle buffer.
- buffer A is currently in an idle state, and the CPU can store the verified data pages in buffer A.
- buffer A is full, the CPU can mark buffer A as busy.
- the CPU can set a status tag for the buffer and set the status tag to busy.
- the CPU can notify the hardware acceleration device to compress the data pages in buffer A to obtain a compressed backup data packet, and can The compressed backup data packet is stored in the storage space corresponding to the backup directory in the backup data disk.
- the CPU can change the state tag of the A buffer to an idle state so that the data page to be backed up can be rewritten.
- the busy state indicates that the buffer is full of data
- the free state indicates that no data is stored in the buffer or that the buffer is not full of data
- the hardware acceleration device compresses the data pages in the A buffer, while the CPU can continue to read the data pages in the data file under the original database directory from the main data disk, and store the read data pages in the buffer B.
- the buffer B is full, the hardware acceleration device is notified to compress the data pages in the B buffer.
- the A buffer and the B buffer can be used to alternately store the verified data pages.
- the CPU can continue to store the verified data pages through the B buffer.
- the B buffer is full and the A buffer is free again, it can be stored through the A buffer again. In this way, the verification and compression of the data pages can be performed in parallel, thereby improving the overall efficiency of the physical backup of the database.
- two buffers are used to illustrate the processing flow of physical backup of the database, but the embodiment of the present application does not limit the number of buffers. For example, one buffer, three, four or more buffers may also be set.
- the storage space corresponding to the original database directory and the backup directory can be the same hard disk, such as both being the primary data disk. In other words, the data files under the original database directory and the backup files under the backup directory can be stored in the same hard disk.
- a very efficient compression algorithm is implemented in hardware (including various processing circuits) inside the hardware acceleration device, and the compression performance is far better than the compression algorithm implemented in software (such as lz4, zlib, zstd, etc.).
- the hardware acceleration device has a faster compression speed and a smaller compression volume. Therefore, in the above-mentioned processing flow, the compression of data pages by the hardware acceleration device can shorten the time required for compression. At the same time, considering that the compression of data pages by the hardware acceleration device will bring communication overhead between hardware, such as the communication overhead between the CPU and the hardware acceleration device.
- one or more buffers can be set in the embodiment of the present application. When the buffer is full, the hardware acceleration device is notified to perform compression processing on all data pages in the buffer. In this way, the communication overhead between hardware can be reduced and the computing power of the hardware acceleration device can be fully utilized.
- the electronic device 100 may include a database 106, and the database 106 may include a backup tool 1061. That is, the backup tool 1061 may be a subprogram module in the database 106.
- FIG. 4 is a schematic diagram of a scenario in which a backup tool disclosed in an embodiment of the present application performs a physical backup of a database.
- the backup tool 1061 can read the data files in the database from the master data disk, perform operations such as verification, and store the verified data pages into a buffer in the memory, such as buffer A, buffer B, etc.
- the backup tool 1061 can call the data compression API provided by the hardware acceleration device, and the data compression API will call the driver of the hardware acceleration device accordingly, and the driver of the hardware acceleration device
- the program drives the hardware acceleration device to compress the data pages in the buffer, and a compressed backup data packet can be obtained.
- the backup tool 1061 can store the compressed backup data packet in a backup file under the backup directory to generate a complete backup set.
- backup tool 1061 For a more detailed description of the backup tool 1061, please refer to the relevant description in the following method embodiment, which will not be repeated here.
- the backup tool 1061 may not be a submodule of the database 106 , but may be an independent module.
- the software structure shown in Figure 3 is only an exemplary description and does not constitute a limitation. In other embodiments of the present application, the software structure shown in Figure 3 may include more or fewer software modules than shown in the figure.
- FIG5 is a flow chart of a database physical backup method disclosed in an embodiment of the present application. As shown in FIG5, it is a processing flow of database physical backup under a single buffer setting.
- the processing method can be executed by a central processing unit CPU, specifically, the processor can call a program in a backup tool to execute.
- the processing flow may include but is not limited to the following steps:
- the CPU opens the first data file in the original database directory.
- the first data file may be any unbacked-up data file stored in the primary data disk and can be found through the original database directory.
- the CPU can back up the data files in the original database directory in sequence, generate corresponding backup files, and store them in the backup data disk corresponding to the backup directory.
- the CPU can first open any unbacked up data file in the original database directory to facilitate reading the data pages therein.
- the CPU reads data pages from the first data file in sequence, and stores the read data pages in the first buffer in sequence.
- the hardware acceleration device of the electronic device can compress batch data pages in units of buffers. Therefore, in the case of a single buffer, the CPU can first determine whether the first buffer is in an idle state. If the first buffer is in a non-idle state, that is, in a busy state, the CPU can wait for the first buffer to be idle. If the first buffer is in an idle state, the CPU can read the data pages from the first data file in sequence, and store the read data pages in the first buffer in sequence until the first buffer is full. It should be understood that there may be an order between the data pages in the data file, and each data page may include an identifier, such as a logical address, a data page number, etc.
- reading the data pages in sequence from the first data file can be understood as reading in sequence in the order of the data pages, such as reading continuous data pages in sequence in the order of the logical addresses.
- the first buffer defaults to an idle state.
- the size of the first buffer can be N times the size of the data page, and N is an integer greater than or equal to 2.
- N can be 8, 16, etc.
- the size of the first buffer can be 64KB, 128KB, etc.
- the first buffer may be a storage area opened in the memory of the electronic device, and may be used by the electronic device to temporarily store data pages that need to be compressed during a database physical backup process.
- the number of remaining unread data pages in the first data file may be small and insufficient to fill the first buffer.
- the remaining unread data pages in the first data file can be stored in the first buffer, and then the first buffer can be considered full.
- the CPU before the CPU stores the read data pages into the first buffer in sequence, it may first verify the read data pages in sequence.
- the verification here includes two situations.
- One situation is: for each data page Calculate the check code.
- the CPU after the CPU calculates the check code for each data page, it can store the data page and the corresponding check code together in the first buffer (such as attaching the check code to the end of the corresponding unbacked up data page), so that the accuracy of the data can be verified when restoring the data based on the backup set later.
- Another case is: calculate the check code for each data page, and compare it with the previously calculated check code to determine whether the currently read data page is accurate.
- the CPU calculates the check code for each data page, it can compare the currently calculated check code with the previously calculated check code for the data page (such as the check code calculated when the data page was previously stored in the main data disk). If they are the same, it indicates that the currently read unbacked up data page is correct and can be stored in the first buffer. If they are different, it indicates that the currently read unbacked up data page is wrong and this data page can be skipped, or data recovery can be performed.
- check code is usually obtained through some kind of operation based on the original data to verify the correctness of the original data.
- Commonly used check codes may include parity check code, Hamming check code, cyclic redundancy check (CRC) code, etc.
- the CPU changes the state of the first buffer to a busy state, and compresses multiple data pages in the first buffer through a hardware acceleration device to obtain a corresponding backup data packet.
- the first buffer will be full.
- the CPU can change the state of the first buffer to a busy state.
- the CPU can notify the hardware acceleration device to compress the data pages in the first buffer.
- the backup data packet includes the corresponding backup data packet obtained by compressing the data pages in the first buffer by the hardware acceleration device.
- the backup data packet corresponding to the data page in the first buffer can also be understood as the compressed data corresponding to the data page in the first buffer.
- the CPU compressing the multiple data pages in the first buffer through the hardware acceleration device may specifically include: the CPU may call the compression API (application programming interface) provided by the hardware acceleration device, the compression API will correspondingly call the driver of the hardware acceleration device, and the driver of the hardware acceleration device will drive the hardware acceleration device to compress the multiple data pages in the first buffer.
- the compression API application programming interface
- the CPU calls the compression API provided by the hardware acceleration device, it may provide corresponding parameters, such as the address of the first buffer, so that the hardware acceleration device can compress the multiple data pages in the first buffer.
- the CPU writes the corresponding backup data packet into the backup data disk corresponding to the backup directory, and changes the state of the first buffer to an idle state.
- the hardware acceleration device can notify the CPU that the compression of the first buffer is complete. Afterwards, the CPU can compress the multiple data pages in the first buffer, and write the corresponding backup data packets obtained into the backup data disk corresponding to the backup directory. In addition, the CPU can change the state of the first buffer to an idle state so that the data pages to be backed up can be rewritten, that is, the remaining unread data pages. It should be noted that after the hardware acceleration device has compressed the data in the first buffer, the compressed backup data packets can be stored in the first buffer or in other idle buffers. For these two different situations, the time for the first buffer to change to an idle state can be different.
- the compressed backup data packet is stored in the first buffer, it is necessary to wait for the CPU to write the backup data packet to the backup data disk corresponding to the backup directory. After the backup data packet is written to the backup data disk corresponding to the backup directory, the state of the first buffer can be changed to an idle state. If the compressed backup data packet is stored in other idle buffers, the state of the first buffer can be changed to an idle state after the hardware acceleration device has compressed the data pages of the first buffer, and there is no need to wait. The backup data package is written to the backup data disk corresponding to the backup directory.
- the CPU determines whether the first data file has been backed up. If the first data file has not been backed up, the CPU executes step 502. If the first data file has been backed up, the CPU executes step 506.
- the CPU After the CPU writes the backup data packets corresponding to the multiple data pages in the first data file into the backup data disk corresponding to the backup directory, it can determine whether the first data file has been backed up, that is, whether the first data file still includes data pages that have not been backed up or data pages that have not been read. If the first data file has not been backed up (the first data file still includes data pages that have not been backed up), the CPU can continue to read the data pages that have not been backed up in the first data file and execute step 502. If the first data file has been backed up (the first data file does not include data pages that have not been backed up), the CPU can continue to determine whether the original database directory still includes data files that have not been backed up and execute step 506.
- the CPU determines whether the original database directory includes data files that have not been backed up. If the original database directory includes data files that have not been backed up, step 501 may be executed. If the original database directory does not include data files that have not been backed up, step 507 may be executed. When the first data file is backed up, the CPU may determine whether the original database directory includes data files that have not been backed up. If the original database directory includes data files that have not been backed up, the CPU may continue to open any data file that has not been backed up in the original database directory and may execute step 501. If the original database directory does not include data files that have not been backed up, that is, all data files in the original database directory have been backed up, the CPU may execute step 507.
- the first data file is first backed up to the first backup data disk, and then the first data file is backed up to the second backup data disk. Therefore, when judging whether a first data file is an unbacked up data file, or judging whether a data page is an unbacked up data page, there may be different judgment results due to the above reasons. Based on this, in the embodiment of the present application, for the above steps 502-507, the CPU judges whether the first data file includes unbacked up data pages.
- the unbacked up data pages refer to data pages that have not been backed up during this backup process, such as the process of steps 501-507 executed this time, and the CPU judges whether the original database directory also includes unbacked up data files, which refers to data files that have not been backed up during this backup process, such as the process of steps 501-507 executed this time.
- backup data disk and the primary data disk may be the same physical hard disk or different physical hard disks.
- the CPU completes the physical backup of the database and releases related resources.
- the backup directory may include complete backup files, and the CPU may end the physical backup of the database and release related resources, such as releasing memory resources such as the first buffer.
- step 506 can be executed first.
- the CPU can execute step 507.
- the CPU can execute step 501 and the following steps 502 to 505.
- step 506 can be executed again to continue to determine whether the original database directory still includes unbacked up data files.
- the above steps 501 to 507 It can be divided into two independent execution processes, one is responsible for writing the unbacked up data pages to the first buffer, and the other is responsible for compressing the data pages in the first buffer.
- These two processing flows can be executed in parallel and can be associated through the state of the first buffer.
- it can be a processing flow for writing data pages to the first buffer.
- the CPU can detect whether the first buffer is in an idle state.
- the CPU can continuously write unbacked up data pages to the first buffer.
- it can be a processing flow for compressing data pages in the first buffer through a hardware acceleration device.
- the CPU When compressing data pages in the first buffer, the CPU can detect whether the first buffer is in a busy state. When the first buffer is in a busy state, that is, when the first buffer is full, the CPU can compress the data pages in the first buffer through the hardware acceleration device, and write the compressed backup data packets into the backup data disk corresponding to the backup directory.
- the relevant steps in Figures 6 and 7 can refer to the relevant descriptions in the above steps 501-step 507, which will not be repeated in detail here.
- the CPU when the CPU performs physical backup of the database, it can compress batch data pages in units of buffers through the hardware acceleration device, which can greatly improve the efficiency of data compression, thereby increasing the speed of physical backup of the database.
- FIG5 is a database physical backup process under a single buffer, but in actual situations, multiple buffers can also be used during database physical backup, so that data reading and compression can be performed in parallel, or data reading, verification and compression can be performed in parallel, thereby further improving the speed of database physical backup.
- the following is an exemplary description of the database physical backup process under multiple buffer settings in conjunction with FIG8 and FIG9.
- FIG8 can be a process for writing unbacked up data pages into a buffer in an idle state under a multiple buffer setting
- FIG9 can be a process for compressing data pages in a buffer in a busy state through a hardware acceleration device under a multiple buffer setting.
- the processing flow may be executed by a central processing unit (CPU), specifically, the processor may call a program in a backup tool to execute the processing flow, and the processing flow may include but is not limited to the following steps:
- 801.CPU opens the first data file in the original database directory.
- the first data file may be any unbacked-up data file stored in the primary data disk and can be found through the original database directory.
- Step 801 is similar to step 501 , and reference may be made to the relevant description in the above step 501 .
- the CPU reads data pages from the first data file in sequence, and stores the read data pages in the second buffer in sequence.
- the second buffer may be any buffer in an idle state among the multiple buffers.
- the CPU may first determine whether there is a buffer in an idle state among the multiple buffers. If there is no buffer in an idle state among the multiple buffers, the CPU may wait for the multiple buffers to be idle. If there is a buffer in an idle state among the multiple buffers, the CPU may read data pages from the first data file in sequence, and store the read data pages in the second buffer in sequence until the second buffer is full. In the initial state, the multiple buffers are in an idle state by default. It should be understood that the multiple buffers may be buffers opened from the memory space for storing the data pages to be backed up.
- the number of remaining unread data pages in the first data file may be small and insufficient to fill the second buffer.
- the first data file may be read from the remaining data pages. All remaining unread data pages in a data file are stored in the second buffer, and the second buffer can be considered to be full.
- the CPU may first verify the read data pages that have not been backed up in sequence.
- data page verification please refer to the relevant description in step 501, which will not be repeated here.
- the CPU changes the state of the second buffer to a busy state.
- the second buffer will be full.
- the CPU can change the state of the second buffer to a busy state.
- the CPU can notify the hardware acceleration device to compress the data pages in the second buffer.
- the CPU determines whether the first data file has been backed up. If the first data file has not been backed up, the CPU executes step 802; if the first data file has been backed up, the CPU executes step 805.
- the CPU After the CPU has filled the second buffer, it can determine whether the first data file has been backed up, that is, whether the first data file still includes data pages that have not been backed up. If the first data file has not been backed up, the CPU can continue to read the data pages that have not been backed up in the first data file and execute step 802. If the first data file has been backed up, the CPU can continue to determine whether the original database directory still includes data files that have not been backed up and execute step 805.
- step 805 The CPU determines whether the original database directory includes any unbacked-up data files. If the original database directory includes any unbacked-up data files, step 801 may be executed. If the original database directory does not include any unbacked-up data files, step 806 may be executed.
- the CPU may determine whether the original database directory still includes data files that have not been backed up. If the original database directory includes data files that have not been backed up, the CPU may continue to open any data file that has not been backed up in the original database directory and execute step 801. If the original database directory does not include data files that have not been backed up, that is, all data files in the original database directory have been backed up, the CPU may execute step 806.
- the CPU completes the physical backup of the database and releases related resources.
- the backup directory may include complete backup files, and the CPU may end the physical backup of the database and release related resources, such as releasing memory resources such as the second buffer.
- the processing flow shown in FIG. 9 is introduced below. As shown in FIG. 9 , the processing flow may be executed by a central processing unit (CPU), and specifically may be executed by the processor calling a program in a backup tool.
- the processing flow may include but is not limited to the following steps:
- the CPU can obtain a corresponding backup data packet by compressing multiple data pages in the third buffer through a hardware acceleration device.
- the third buffer may be any buffer in a busy state among the multiple buffers.
- the CPU can continue to write the data pages to be backed up into the free buffers in the multiple buffers.
- the CPU can notify the hardware acceleration device to compress the data pages in the buffer. Therefore, when there is a buffer in a busy state among the multiple buffers of the CPU, the CPU can compress the multiple data pages in the third buffer through the hardware acceleration device, and obtain the corresponding backup data packet.
- the CPU can compress the multiple data pages in buffer a through the hardware acceleration device, and then compress the multiple data pages in buffer b.
- the hardware acceleration device when there are multiple buffers in a busy state, can compress the data pages in the multiple buffers in a busy state in parallel, thereby improving the overall speed of the database physical backup.
- the CPU when there are multiple buffers in a busy state, the CPU can respectively compress the data pages in different buffers through multiple hardware acceleration devices, thereby improving the overall speed of the database physical backup.
- the CPU writes the corresponding backup data packet into the backup data disk corresponding to the backup directory, and changes the state of the third buffer to an idle state.
- the hardware acceleration device can notify the CPU that the compression of the third buffer is completed. After that, the CPU can compress the multiple data pages in the third buffer and write the corresponding backup data packets obtained into the backup data disk corresponding to the backup directory. In addition, the CPU can change the state of the third buffer to an idle state so that the data pages to be backed up can be rewritten.
- buffer a In the process of database physical backup shown in Figures 8 and 9 above, data reading, verification and compression can be performed in parallel. For example, it is assumed that buffer a, buffer b and buffer c are included, and the initial states are all idle.
- the CPU can read the unbacked up data pages from the original database directory and verify them, and then store the verified data pages in buffer a.
- buffer a When buffer a is full, the state of buffer a can be changed to a busy state, and the hardware acceleration device can be notified to compress the data pages in buffer a. While the hardware acceleration device is compressing the data pages in buffer a, the CPU can continue to read the unbacked up data pages from the original database directory and verify them, and then store the verified data pages in buffer b.
- buffer b When buffer b is full, the state of buffer b can be changed to a busy state, and the hardware acceleration device can be notified to compress the data pages in buffer b. In this way, after the hardware acceleration device has compressed the data pages in buffer a, it can continue to compress the data in buffer b without waiting, thereby improving the overall efficiency of database physical backup.
- the time required to read the unbacked-up data pages from the original database directory to fill a buffer in an idle state may be different from the time required for the hardware acceleration device to compress a buffer in a busy state and for the CPU to write the corresponding backup data packet to the data disk corresponding to the backup directory (hereinafter referred to as T2). Therefore, in order to fully utilize the processing power of the hardware acceleration device included in the electronic device, in some cases, the electronic device can start multiple threads to read the unbacked-up data pages from the original database directory in parallel (such as multiple threads respectively read the unbacked-up data pages in different data files under the original database directory), and store the read unbacked-up data pages in the buffer in the idle state.
- the number of started threads may be determined according to the first time, the second time, and the number of hardware acceleration devices (hereinafter referred to as M).
- the number of threads opened can be Indicates rounding up.
- the number of full buffers can be greater than the number of compressed buffers at the same time, or the number of full buffers can be equal to the number of compressed buffers (that is, the storage of buffers and the compression of buffers are balanced), so that the hardware acceleration device of the electronic device can be maximized, and the speed of physical backup of the database can be further improved.
- the number of buffers is also required to support.
- the number of buffers can be determined based on the number of threads and the number of hardware acceleration devices.
- the number of buffers can be the number of threads + the number of hardware acceleration devices, that is, In this way, it can be ensured that the number of buffers is sufficient and the multiple threads opened have corresponding
- the idle buffer can write the data page that has not been backed up, and it can also ensure that the hardware acceleration device included in the electronic device has a busy buffer for compression processing.
- each buffer it can store the corresponding backup data packet obtained by the hardware acceleration device compressing the data in the buffer.
- the data page to be backed up in a buffer and the backup data packet corresponding to the data page to be backed up can be stored in the same buffer.
- T1 is 4 seconds, which means that the time required for a thread to read the unbacked data pages from the original database directory and fill a buffer in an idle state is 4 seconds, including the time for data page verification.
- T2 is 2 seconds, which means that the time required for a hardware acceleration device to compress a buffer in a busy state and for the CPU to write the corresponding backup data to the data disk corresponding to the backup directory is 2 seconds.
- the number of hardware acceleration devices is 2, namely hardware acceleration device 1 and hardware acceleration device 2. At this time, it can be calculated that the number of threads is 4/2*2, which is 4 threads, namely thread 1-thread 4.
- the number of buffers can be 4+2, which is 6 buffers, namely buffer 1-buffer 6.
- the 6 buffers can be empty. After that, threads 1-thread 4 can write unbacked data pages to buffers 1-buffer 4 in parallel. In 4 seconds, buffers 1-buffer 4 can be filled. Afterwards, between 4s and 6s, the hardware acceleration device 1 and the hardware acceleration device 2 can compress the buffer 1 and the buffer 2 respectively, and the CPU can write the corresponding backup data packet to the data disk corresponding to the backup directory, and at the same time, the thread 1 and the thread 2 can write the unbacked data pages to the buffer 5 and the buffer 6 respectively. At 6s, the hardware acceleration device 1 and the hardware acceleration device 2 can compress the data pages in the buffer 1 and the buffer 2 respectively.
- the hardware acceleration device 1 and the hardware acceleration device 2 can compress the buffer 3 and the buffer 4 respectively, and the CPU can write the corresponding backup data packet to the data disk corresponding to the backup directory, and at the same time, the thread 1 and the thread 2 can continue to write the unbacked data pages to the buffer 5 and the buffer 6 respectively, and the thread 3 and the thread 4 can write the unbacked data pages to the buffer 1 and the buffer 2 respectively.
- the hardware acceleration device 1 and the hardware acceleration device 2 can compress the data pages in the buffer 3 and the buffer 4 respectively, and the CPU can write the corresponding backup data packet to the data disk corresponding to the backup directory, and at the same time, the thread 1 and the thread 2 can fill the buffer 5 and the buffer 6 respectively.
- hardware acceleration device 1 and hardware acceleration device 2 can compress buffer 5 and buffer 6 respectively, and the CPU can write the corresponding backup data packet to the data disk corresponding to the backup directory.
- thread 3 and thread 4 can continue to write the unbacked up data pages to buffer 1 and buffer 2 respectively, and thread 1 and thread 2 can write the unbacked up data pages to buffer 3 and buffer 4 respectively.
- hardware acceleration device 1 and hardware acceleration device 2 can compress the data pages in buffer 5 and buffer 6 respectively, and the CPU can write the corresponding backup data packet to the data disk corresponding to the backup directory.
- thread 3 and thread 4 can fill buffer 1 and buffer 2 respectively.
- threads 1-thread 4 can write data pages to four idle buffers in parallel, and hardware acceleration device 1 and hardware acceleration device 2 can compress two busy buffers in parallel.
- hardware acceleration device 1 or the hardware acceleration device 2 has finished compressing a buffer in a busy state and the CPU has written the corresponding backup data packet into the data disk corresponding to the backup directory, there will always be a buffer in a busy state for it to continue the compression process, so that the processing resources of the hardware acceleration device 1 and the hardware acceleration device 2 can be fully utilized.
- T1 is also related to the size of the buffer, the CPU load, etc.
- T2 is also related to the size of the buffer, the load of the hardware acceleration device, the processing power of the hardware acceleration device itself, etc.
- the size of the buffer and the processing power of the hardware acceleration device itself are determined, if the tasks other than database physical backup occupy more CPU resources, the CPU load may be greater, which will result in the CPU resources used for database physical backup being used more. The resources are reduced, which will cause the above T1 to increase accordingly.
- T2 will also change accordingly. The more processing resources of the hardware acceleration device are occupied by other tasks, the greater the load of the hardware acceleration device can be, and accordingly, T2 can be larger.
- the electronic device can dynamically adjust the number of threads and the number of buffers.
- the electronic device can dynamically monitor or predict the size of T1 and T2, and then dynamically adjust the number of threads and buffers opened when T1 or T2 changes. For example, assuming that the number of hardware acceleration devices is 2, the current T1 is 4s, and T2 is 2s. At this time, the electronic device can open 4 threads for buffer storage, and can set 6 buffers. Afterwards, if the CPU load is high, T1 can become 2s, and T2 remains unchanged. At this time, the electronic device can determine that it is more appropriate to open 2 threads and 4 buffers. Therefore, the electronic device can close 2 threads and release the resources of 2 buffers. In this way, on the basis of ensuring the efficiency of database physical backup, the occupation of CPU resources and memory resources can be reduced.
- the electronic device dynamically adjusts the number of threads and the number of buffers, which can keep the storage of the buffer and the compression of the buffer in a balanced state, that is, in the same time, the number of full buffers and the number of compressed buffers are almost equal.
- the performance of the hardware acceleration device can be fully utilized, and the number of buffers and threads is more appropriate, which can avoid occupying too many CPU resources and too many memory resources.
- FIG. 10 is a flow chart of another database physical backup method disclosed in an embodiment of the present application.
- the backup method can be executed by a central processing unit (CPU) of an electronic device.
- the electronic device may include a central processing unit, a memory, and a hardware acceleration device.
- the memory may include a first buffer.
- the processing flow may include but is not limited to the following steps:
- the CPU reads the first data in the first data file and stores the first data in the first buffer.
- the first data file is stored in a data disk corresponding to the original database directory, such as the master data disk.
- the first data may be an unbacked up data page in the first data file.
- the size of the first buffer may be N times the size of the data page, where N is an integer greater than or equal to 2.
- the CPU may first open the first data file, and then, if the first buffer is not stored in or is not full of the first data, in this case, the first buffer can continue to store data, and at this time, the CPU may read the first data in the first data file, and store the read first data in the first buffer. It should be understood that the fact that the first buffer is not stored in or is not full of the first data can be understood as the first buffer being in an idle state, and reference may be made to the relevant description in the embodiment shown in FIG. 5 above.
- the corresponding compressed data may also be stored in the first buffer, and then the compressed data stored in the first buffer may be written to the data disk corresponding to the backup directory. Afterwards, when the compressed data stored in the first buffer is successfully written to the data disk corresponding to the backup directory, the first buffer may be regarded as an empty buffer, that is, a buffer in which the first data is not stored.
- the CPU before the CPU stores the read first data into the first buffer, it may verify the read first data, and then, if the first data verification succeeds, the CPU stores the first data page into the first buffer.
- the CPU compresses the first data in the first buffer through a hardware acceleration device to generate backup data corresponding to the first data.
- the CPU compresses the first data in the first buffer through the hardware acceleration device, so that the utilization rate of the buffer space can be guaranteed.
- the CPU can read the remaining data from the first data file, that is, read the remaining unread data pages or data pages to be backed up in the first data file, and store the remaining read data in the second buffer.
- the second buffer can be any other buffer in the multiple first buffers that does not store or is not full of the first data.
- the CPU can continue to read the remaining data from the first data file and store the remaining data in another first buffer, which can ensure that the CPU writes data to the buffer in parallel with the hardware acceleration device compressing the data, thereby improving the efficiency of data backup.
- the first data can be read from multiple data files in the original database directory by multiple threads, and the first data in different data files can be stored in multiple buffers that are not stored in or are not full of the first data; when the third buffer is full, the data in the third buffer is compressed by a hardware acceleration device.
- the third buffer can be any full buffer among the multiple first buffers.
- first data in the above-mentioned multiple data files may be different data.
- the CPU may process and compress the data in the multiple full buffers in parallel through multiple hardware acceleration devices.
- the number of threads opened and the number of buffers can be determined according to the first time, the second time, and the number of hardware acceleration devices included in the electronic device.
- the first time can be the time required to read the first data from the original database directory to fill a first buffer
- the second time is the time required for a hardware acceleration device to complete the compression of a full first buffer and for the CPU to write the corresponding backup data into the data disk corresponding to the backup directory.
- the hardware acceleration device of the electronic device is disposed in a processor of the electronic device, that is, the hardware acceleration device of the electronic device and the processor of the electronic device may be integrated together.
- the CPU can dynamically adjust the number of threads and/or the number of buffers according to the first time, the second time, and the number of hardware acceleration devices included in the electronic device.
- the CPU writes the corresponding backup data into the data disk corresponding to the backup directory.
- the execution subject in the above steps 501-507, 601-606, 701-702, 801-806, 901-902, 1001-1003 may be other controllers or processing chips in the electronic device, such as CPLD, FPGA and other controllers, in addition to the central processing unit CPU in the electronic device.
- FIG5, FIG6, FIG7, FIG8, FIG9 and FIG10 take the CPU as the execution subject of the interactive diagram as an example to illustrate the above-mentioned processing flow, but the present application does not limit the execution subject of the interactive diagram.
- the CPU in Figures 6, 7, 8, 9 and 10 can also be a chip, chip system, electronic device, etc. that supports the CPU to implement the method, or it can be a logic module or software (such as the above-mentioned backup tool) that can implement all or part of the CPU functions.
- the present application also discloses an electronic device, which includes a memory and a processor, wherein the memory is used to store a computer program or computer instruction of the electronic device, and the processor can be used to read the program or computer instruction stored in the memory and execute the method in the above method embodiment.
- the above memory may include but is not limited to a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM) or a portable read-only memory (CD-ROM).
- the above processor may be a CPU, a complex programmable logic device, a general processor, a digital signal processor, an application-specific integrated circuit, a field programmable gate array or other programmable logic device, a transistor logic device, a hardware component or any combination thereof.
- the processor may also be a combination that realizes a computing function, such as a combination of one or more microprocessors, a combination of a digital signal processor and a microprocessor, and the like.
- the embodiment of the present application also discloses a computer-readable storage medium having instructions stored thereon, and when the instructions are executed, the method in the above method embodiment is executed.
- the embodiment of the present application also discloses a computer program product including instructions, which, when executed, execute the method in the above method embodiment.
- a unit can be, but is not limited to, a process running on a processor, a processor, an object, an executable file, an execution thread, a program, or a combination of hardware and software. and/or distributed between two or more computers.
- these units can be executed from various computer-readable media having various data structures stored thereon.
- the units can communicate through local and/or remote processes, for example, based on signals having one or more data packets (e.g., data from a second unit interacting with another unit in a local system, a distributed system, and/or a network.
- data packets e.g., data from a second unit interacting with another unit in a local system, a distributed system, and/or a network.
- the Internet interacts with other systems via signals).
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本申请实施例公开一种数据备份方法、电子设备及计算机可读存储介质,该方法应用于电子设备,该电子设备包括中央处理器、内存和硬件加速设备,该内存包括第一缓冲区;该方法具体包括:读取第一数据文件中的第一数据,并将读取的第一数据存入第一缓冲区,其中,第一数据文件存储在原数据库目录对应的数据盘中;通过硬件加速设备压缩第一缓冲区中的第一数据,以生成第一数据对应的备份数据;将备份数据写入备份目录对应的数据盘中。本申请实施例,可以提高数据备份的效率。
Description
本申请要求在2023年6月19日提交中国国家知识产权局、申请号为202310729914.0,发明名称为“数据备份方法、电子设备及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及数据库领域,尤其涉及一种数据备份方法、电子设备及计算机可读存储介质。
随着网络、通信技术的快速发展,海量数据正以前所未有的增长趋势冲击着各个行业。为了有效地管理、维护这些数据,用户通常会使用数据库进行数据存储。同时,为了防止数据库中的数据丢失,用户经常需要进行数据库的物理备份。
与此同时,如何提高数据库的物理备份效率是业界关注的问题。
发明内容
本申请实施例公开了一种数据备份方法、电子设备及计算机可读存储介质,可以提高数据库的物理备份效率。
第一方面公开一种数据备份方法,该数据备份方法可以应用于电子设备,也可以应用于电子设备中的模块(例如,芯片、中央处理器等),还可以应用于能实现全部或部分电子设备功能的逻辑模块或软件(如下述备份工具)。该电子设备可以包括中央处理器、内存和硬件加速设备,该内存可以包括第一缓冲区,下面以应用于电子设备为例进行描述。该数据备份方法可以包括:读取第一数据文件中的第一数据,并将该第一数据存入该第一缓冲区,其中,该第一数据文件存储在原数据库目录对应的数据盘中;通过该硬件加速设备压缩该第一缓冲区中的第一数据,以生成该第一数据对应的备份数据;将该备份数据写入备份目录对应的数据盘中。
本申请实施例中,在进行数据库物理备份时,可以先将第一数据存入第一缓冲区中,然后可以通过硬件加速设备以第一缓冲区为单位进行数据的压缩,这样,可以保证硬件加速设备的性能得到充分利用,从而可以提高数据压缩的效率,进而可以提高数据库物理备份的效率。并且,通过硬件加速设备进行数据压缩,还可以降低电子设备的整体功耗。
作为一种可能的实施方式,该读取第一数据文件中的第一数据,并将该第一数据存入该第一缓冲区,包括:打开第一数据文件;在该第一缓冲区没有存入或者没有存满第一数据的情况下,读取该第一数据文件中的第一数据,并将该第一数据存入该第一缓冲区。
本申请实施例中,在第一缓冲区没有存入或者没有存满第一数据的情况下,才读取第一数据文件中的第一数据,并将读取的第一数据存入第一缓冲区中。这样,可以保证
第一缓冲区中存储的第一数据不会被覆盖,从而可以保证数据的完整性,进而可以保证备份数据的完整性。
作为一种可能的实施方式,该通过该硬件加速设备压缩该第一缓冲区中的第一数据,包括:在该第一缓冲区中存满该第一数据的情况下,通过该硬件加速设备压缩该第一缓冲区中的第一数据。
本申请实施例中,在第一缓冲区存满之后,再压缩第一缓冲区中的数据,可以最大程度地发挥硬件加速设备的性能,从而可以提高数据库物理备份的速度。
作为一种可能的实施方式,该方法还可以包括:硬件加速设备完成压缩该第一缓冲区中的第一数据之后,将该第一缓冲区的状态变更为空闲状态。
本申请实施例中,可以通过为第一缓冲区设置状态标记,当第一缓冲区的状态标记为空闲时,可以使用第一缓冲区,向第一缓冲区中写入第一数据,当第一缓冲区的状态标记为非空闲时,不可以使用第一缓冲区。这样,可以保证电子设备可以基于第一缓冲区的状态标记充分利用第一缓冲区。
作为一种可能的实施方式,该第一缓冲区为多个,该方法还包括:从该第一数据文件中读取剩余数据,并将该剩余数据存入第二缓冲区中,该第二缓冲区为该第一缓冲区中的其余任意一个没有存入或者没有存满该第一数据的缓冲区;当该硬件加速设备完成压缩一个第一缓冲区中的第一数据,通过该硬件加速设备压缩该第二缓冲区中的数据。
本申请实施例中,第一缓冲区的数量可以为多个,这样,可以保证硬件加速设备在压缩一个第一缓冲区中的数据的同时,电子设备可以向其它没有存入或者没有存满数据的缓冲区写入数据,可以使得缓冲区的压缩和缓冲区的存储并行,从而可以进一步提高数据库物理备份的效率。
作为一种可能的实施方式,该将该第一数据存入该第一缓冲区中包括:校验该第一数据;在校验成功的情况下,将该第一数据页存入该第一缓冲区中。
本申请实施例中,在将读取的第一数据存入第一缓冲区之前,可以先进行校验,在校验成功的情况下,再将读取的第一数据存入第一缓冲区中,这样,可以避免存入错误的数据,可以保证备份数据的准确性。并且,通过数据校验还可以丢弃错误的数据,可以减少需要压缩的数据,从而可以节约电子设备的处理资源,同时,还可以降低电子设备的整体功耗。
作为一种可能的实施方式,该方法还包括:通过多个线程从该原数据库目录下多个数据文件中分别读取第一数据,并将读取的不同数据文件中的第一数据分别存入多个第一缓冲区没有存入或者没有存满该第一数据的缓冲区中;当第三缓冲区存满的情况下,通过该硬件加速设备压缩该第三缓冲区中的数据,该第三缓冲区为该多个第一缓冲区中任一存满的缓冲区。
本申请实施例中,可以通过多个线程向多个缓冲区中写入数据,这样,可以保证电子设备的硬件加速设备始终有存满的缓冲区可以进行压缩,从而可以使得硬件加速设备的性能得到充分利用,进而可以进一步提高数据库物理备份的效率。
作为一种可能的实施方式,该电子设备包括多个硬件加速设备,该方法还包括:当该多个第一缓冲区中存在多个存满的缓冲区时,通过该多个硬件加速设备处理并行压缩该多个存满的缓冲区中的数据。
本申请实施例中,电子设备可以包括多个硬件加速设备,此时,电子设备可以通过
多个硬件加速设备并行压缩多个缓冲区中的数据,可以进一步提高数据库物理备份的效率。
作为一种可能的实施方式,该线程的数量根据第一时间、第二时间、该电子设备包括的硬件加速设备的数量确定;该第一时间为从原数据库目录下读取第一数据到存满一个第一缓冲区所需的时间,该第二时间为一个硬件加速设备完成压缩一个存满的第一缓冲区,并由该中央处理器将对应的备份数据写入备份目录对应的数据盘中所需的时间。
本申请实施例中,可以根据第一时间、第二时间、硬件加速设备的数量确定需要开启的线程的数据,这样,可以保证线程的数量与电子设备包括的硬件加速设备的处理能力相适配。在保证电子设备的硬件加速设备始终有存满的缓冲区可以进行压缩的情况下,不会过度占用电子设备的处理器资源。
作为一种可能的实施方式,该缓冲区的数量根据第一时间、第二时间、该电子设备包括的硬件加速设备的数量确定;该第一时间为从原数据库目录下读取第一数据到存满一个第一缓冲区所需的时间,该第二时间为一个硬件加速设备完成压缩一个存满的第一缓冲区,并由该中央处理器将对应的备份数据写入备份目录对应的数据盘中所需的时间。
本申请实施例中,可以根据第一时间、第二时间、硬件加速设备的数量确定需要设置的缓冲区的数量,这样,可以保证缓冲区的数量与线程的数量以及电子设备包括的硬件加速设备的处理能力相适配,不会过度占用电子设备的内存资源。
作为一种可能的实施方式,该方法还包括:在数据备份过程中,根据该第一时间、该第二时间、该电子设备包括的硬件加速设备的数量动态调整该线程的数量和/或该缓冲区的数量。
本申请实施例中,第一时间和第二时间可以是变化的,因此,电子设备可以对缓冲区的数量和线程的数量进行动态地调整,以提高缓冲区资源和线程资源的利用效率。并且,对缓冲区的数量和线程的数量动态进行调整可以保证不会过度占用内存资源和处理器资源,也不会导致缓冲区数量和线程数量不够使用,无法发挥硬件加速设备的性能。
作为一种可能的实施方式,该中央处理器包括该硬件加速设备。
本申请实施例中,硬件加速设备可以集成在中央处理器中,这样,可以减少硬件加速设备和中央处理器的通信时延,从而可以进一步提高数据库物理备份的效率。
第二方面公开一种电子设备,该计算设备包括处理器、存储器,该处理器调用该存储器中存储的计算机程序实现如上述第一方面以及第一方面中任一可能的实现方式中所提供的数据备份方法。
第三方面公开一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序或计算机指令,当该计算机程序或计算机指令运行时,实现如上述各方面公开的数据备份方法。
第四方面公开一种芯片,包括处理器,用于执行存储器中存储的程序,当程序被执行时,使得芯片执行上述各方面公开的数据备份方法。
作为一种可能的实施方式,存储器位于芯片之外。
第六方面公开一种计算机程序产品,该计算机程序产品包括计算机程序代码,当该计算机程序代码被运行时,使得上述各方面公开的数据备份方法被执行。
应理解,本申请上述多个方面或者任一种可能的实施方式的实现和有益效果可互相
参考。
图1是本申请实施例公开的一种电子设备的结构示意图;
图2是本申请实施例公开的一种数据库物理备份的场景示意图;
图3是本申请实施例公开的一种电子设备的软件结构示意图;
图4是本申请实施例公开的一种备份工具进行数据库物理备份的场景示意图;
图5是本申请实施例公开的一种数据库物理备份方法的流程示意图;
图6是本申请实施例公开的另一种数据库物理备份方法的流程示意图;
图7是本申请实施例公开的又一种数据库物理备份方法的流程示意图;
图8是本申请实施例公开的又一种数据库物理备份方法的流程示意图;
图9是本申请实施例公开的又一种数据库物理备份方法的流程示意图;
图10是本申请实施例公开的又一种数据库物理备份方法的流程示意图。
本申请实施例公开了一种数据库物理备份方法、电子设备及计算机可读存储介质,可以提高数据库物理备份的效率,并可以减小硬盘空间的占用。下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。
为了更好地理解本申请实施例,下面先对本申请实施例的相关术语和相关技术进行描述。
数据库集群,一般是由多台机器组成的数据库系统。一个数据库集群上可以创建一个或多个数据库。数据库集群中的一台机器可以称为一个节点。
数据库集群可以包括集中式数据库集群、分布式数据库集群等。其中,集中式数据库集群可以包括一个主机(主节点)和多个备机(从节点),主机和备机存储的数据可以相同。分布式数据库集群也可以包括主节点和从节点,每个节点上可以存放数据的某个分片,多个节点存储的数据分片可以共同拼接为一份完整数据。数据库集群中的主节点可以管理和监控各个从节点,或者数据库集群中可以包括一个单独的管理主节点或从节点的管理节点,用于管理和监控各个从节点。例如,主节点可以向各个从节点下发数据操作指令,如数据备份指令,指示从节点进行数据备份。
硬件加速技术,是一种通过硬件设备来加快处理速度的技术,如加快压缩、解压缩、解密、加密、随机数生成、数字签名、视频编解码等处理速度。并且,针对于硬件加速而言,可以设计用于特定处理的专用硬件,比如说,针对于压缩任务,可以设计专用于处理压缩任务的硬件,以提高数据压缩效率。本申请实施例中,这些用于特定处理的专用硬件可以称为硬件加速设备。
在实际场景下,硬件加速设备可以用来协助中央处理器(central processing unit,CPU)进行工作,以提升整体处理效率。例如,可以将计算密集型任务(计算量较大的任务)分配给硬件加速设备来处理,以减轻CPU的压力,并提高整体处理速度。此外,由于硬件加速设备是针对于特定处理设计的专用硬件,因此,相较于CPU而言,其具有更低的功耗。
相关技术中,针对于硬件加速技术,可以包括x86架构下的QAT(quick assist technology,快速辅助技术)、arm(advanced RISC machine,进阶精简指令集机器)架构下的KAE(Kunpeng accelerator engine,鲲鹏加速引擎)。
随着网络、通信技术的快速发展,海量数据正以前所未有的增长趋势冲击着各个行业。为了有效地管理、维护这些数据,用户通常会使用数据库进行数据存储。同时,考虑到数据容灾的需求,通常需要对数据库的数据文件进行周期性的物理备份,以便之后在发生灾难(如正在处理业务数据的主机发生宕机或者其他故障,导致数据丢失)时,备机能够根据之前物理备份生成的备份集进行业务的恢复,从而实现业务的连续进行。
其中,数据库的物理备份,是指将原数据库目录下的数据文件复制一份或多份,分别存放在其它目录下,这些目录可以称为备份目录。备份集,是指数据库备份文件的集合,也就是备份目录下存放的一个或多个备份文件。通常情况下,在进行数据库的物理备份时,一般需要对数据库的数据文件进行压缩操作,以降低备份集的硬盘空间占用。也就是说,备份目录下的每个备份文件可以是一个压缩文件,并且,原数据库目录下的数据文件可以与备份目录下的备份文件一一对应。应理解,原数据库目录可以为待备份数据库的目录。
相关技术中,数据库的物理备份可以包括读数据、校验、压缩、落盘四个步骤。其中,数据库通常使用固定大小的数据页(page)来存储数据,如每个数据页的大小可以为8千字节(kilobyte,KB)。基于此,备份工具(备份软件)可以依次对每个数据页进行校验、压缩处理,并将压缩生成的备份页(backup page)写入备份集目录中。具体地,电子设备可以从数据库的数据文件中依次读取8KB大小的数据页。之后,电子设备可以依次对读取的8KB的数据页进行校验并计算其校验和。再之后,电子设备可以使用由软件实现的压缩算法,如lz4、zlib、zstd等软件压缩算法,依次对校验之后的8KB的数据页进行压缩处理。最后,电子设备可以依次将压缩后的数据页落盘,存储到对应的备份目录下,以生成备份集。
上述处理流程中,需要逐页对每个数据页进行校验,也需要逐页对每个校验之后的数据页进行压缩。其中,校验和压缩都是计算密集型任务,需要占用大量的CPU资源,校验和压缩中任一处理均可能使得CPU处于满负荷工作状态,因此,在CPU资源有限的情况下,校验和压缩一般是串行执行。同时,在进行数据库的物理备份的过程中,压缩是一个较为耗时的过程,通常情况下会占到物理备份总耗时的50%,甚至更多。
为了提高数据库物理备份的效率,减少数据库物理备份的整体耗时,本申请实施例中,可以在数据库物理备份时采用硬件加速技术,也就是说,可以通过硬件加速设备进行数据页的压缩等。并且,可以设置一个或多个缓冲区,可以先将待压缩的数据页存入缓冲区中,当缓冲区存满的情况下,硬件加速设备可以以缓冲区为单位进行批量数据页的压缩,从而可以保证硬件加速设备的性能得到充分利用。
为了更好地理解本申请实施例,下面先对本申请实施例的系统架构进行描述。
请参阅图1,图1是本申请实施例公开的一种电子设备的结构示意图。其中,电子设备100可以为手机终端、平板电脑、笔记本电脑、台式电脑、服务器等,本申请实施例在此不作限定。该电子设备100可以包括:中央处理器101、硬件加速设备102、内存103和
硬盘104。中央处理器101、硬件加速设备102、内存103以及硬盘104可以相互连接或者通过总线105相互连接。
本申请实施例中,硬盘104可以用于存储数据库的数据文件。中央处理器101可以用于从硬盘中读取数据库中的数据文件,并执行校验等操作。内存103可以用于提供一个或多个缓冲区,暂时存储校验之后的数据页。硬件加速设备102可以以缓冲区为单位进行批量数据页的压缩,得到压缩之后的备份数据包。并且,可以将压缩之后的备份数据包存入硬盘104中。
本申请实施例中,硬件加速设备102可以为具有数据压缩功能的专用硬件,以辅助中央处理器101提高数据压缩的效率。示例性的,硬件加速设备102可以为QAT加速卡、KAE加速卡等。
示例性的,硬盘104可以为机械硬盘(hard disk drive,HDD),也可以为固态硬盘(solid state drive,SSD)等。
需要说明的是,图1中仅分别示意1个中央处理器、1个硬件加速设备、1个内存和1个硬盘,但在实际情况下,电子设备可以包括2个、3个、4个或者更多个中央处理器、硬件加速设备、内存和硬盘,本申请实施例对此不作限定。
在一种可能的实现方式中,中央处理器101和硬件加速设备102可以独立设置,也可以集成到一起,如将硬件加速设备102集成到中央处理器101中。需要说明的是,上述电子设备100可以为集中式数据库集群中的某个节点,也可以为分布式数据库集群中的某个节点,还可以为其它包括数据库的设备。其中,针对集中式数据库集群和分布式数据库集群来说,其每个节点(包括主节点和从节点)可以自动进行数据库中数据文件的备份,也可以由主节点或者管理节点统一下发备份命令,各节点在接收到备份命令之后,再进行数据库中数据文件的备份。
需要说明的是,图1所示的系统架构只是示例性说明,并不对其构成限定。实际应用中,电子设备100还可以包括更多或更少的部件,或者不同的部件设置等,这里不作限制。
下面结合图2对本申请的物理备份流程进行示例性说明。
请参阅图2,图2是本申请实施例公开的一种数据库物理备份的场景示意图。在图2中,为了便于进行说明,假设数据库的原数据库目录对应的存储空间为主数据盘,备份目录对应的存储空间为备(份)数据盘。这种情况下,进行数据库物理备份时,需要将主数据盘中存储的原数据库目录下的数据文件读取出来,并压缩后存储到对应的备份目录下。原数据库目录下包括多少个数据文件,备份目录下可以包括对应数量的备份文件,也就是说,原数据库目录下的数据文件可以与备份目录下的备份文件一一对应。并且,备份目录下的备份文件的组织方式与原数据库目录下的数据文件的组织方式可以相同。其中,这里的组织方式可以包括采用的数据结构、数据之间的相互关系等。
具体地,CPU可以从主数据盘中依次读取原数据库目录下的数据文件中的数据页,并且可以对读取的数据页进行校验。当校验完成之后,CPU可以将校验之后的数据页存入空闲的缓冲区中,例如,A缓冲区当前处于空闲状态,CPU可以将校验之后的数据页存入A缓冲区中。当A缓冲区存满之后,CPU可以将A缓冲区标记为忙碌,例如,CPU可以为缓冲区设置一个状态标签,并将该状态标签设置为忙碌,并且,CPU可以通知硬件加速设备对A缓冲区中的数据页进行压缩处理,可以得到压缩后的备份数据包,以及可以将
压缩后的备份数据包存入备数据盘中备份目录对应的存储空间。之后,CPU可以将A缓冲区的状态标签变更为空闲状态,以便可以重新写入待备份的数据页。
也就是说忙碌状态指示缓冲区中存满数据,而空间状态指示缓冲区中没有存入数据或者没有存满数据。
其中,上述处理流程中,当A缓冲区存满,硬件加速设备对A缓冲区中的数据页进行压缩处理的同时,CPU可以继续从主数据盘中读取原数据库目录下的数据文件中的数据页,将读取的数据页存入缓冲区B中,当缓冲区B存满之后,再通知硬件加速设备对B缓冲区中的数据页进行压缩处理。也就是说,A缓冲区和B缓冲区可以用来交替存储校验后的数据页,当A缓冲区存满后,在通过硬件加速设备对A缓冲区中的数据页进行压缩处理的同时,CPU可以通过B缓冲区继续存储校验后的数据页。之后,当B缓冲区存满,且A缓冲区重新空闲后,可以再通过A缓冲区存储,这样,数据页的校验和压缩可以并行进行,从而可以提高数据库物理备份的整体效率。
应理解,通过不断的读取数据页存入A缓冲区和B缓冲区,以及交替对A缓冲区和B缓冲区中的数据页进行压缩,可以完成对所有数据文件的压缩备份,生成完整的备份集。
应理解,图2中以两个缓冲区示意数据库物理备份的处理流程,但本申请实施例并不限定缓冲区的数量。例如,也可以设置1个缓冲区,3个、4个或者更多个缓冲区。还应理解,虽然上述以原数据库目录对应的存储空间为主数据盘,备份目录对应的存储空间为备数据盘进行说明,但并不对其构成限定,在实际情况下,原数据库目录和备份目录对应的存储空间可以为同一硬盘,如均为主数据盘。也就是说,原数据库目录下的数据文件和备份目录下的备份文件可以存储在同一硬盘中。
需要说明的是,硬件加速设备内部以硬件方式(包括各种处理电路)实现了非常高效的压缩算法,压缩性能远优于以软件方式实现的压缩算法(如lz4、zlib、zstd等),例如,硬件加速设备具有更快的压缩速度,以及更小的压缩体积。因此,在上述处理流程中,通过硬件加速设备进行数据页的压缩,可以缩短压缩所需的时间。同时,考虑到通过硬件加速设备进行数据页的压缩会带来硬件间的通信开销,如CPU与硬件加速设备之间的通信开销。因此,如果每校验一个数据页就通知硬件加速设备进行压缩处理,可能会导致硬件通信的开销太大,并且,单数据页的压缩无法有效利用硬件加速设备的性能,从而会影响数据库物理备份的整体效率。基于此,本申请实施例中可以设置一个或多个缓冲区,当缓冲区存满之后,再通知硬件加速设备对缓冲区中的所有数据页进行压缩处理,这样,可以降低硬件间的通信开销,充分利用硬件加速设备的计算能力。
下面对本申请实施例的软件架构进行示例性说明。
请参阅图3,图3是本申请实施例公开的一种电子设备的软件结构示意图。其中,该电子设备100可以包括数据库106,数据库106可以包括备份工具1061。也就是说,备份工具1061可以为数据库106中的一个子程序模块。
请参阅图4,图4是本申请实施例公开的一种备份工具进行数据库物理备份的场景示意图。如图4所示,本申请实施例中,备份工具1061可以从主数据盘中读取数据库中的数据文件,并执行校验等操作,并且可以将校验之后的数据页存入内存中的缓冲区,如A缓冲区、B缓冲区等。当缓冲区存满之后,备份工具1061可以调用硬件加速设备提供的数据压缩API,数据压缩API会相应地调用硬件加速设备的驱动程序,硬件加速设备的驱动
程序会驱动硬件加速设备对缓冲区中的数据页进行压缩,可以得到压缩之后的备份数据包。并且,备份工具1061可以将压缩之后的备份数据包存入备份目录下的备份文件中,以生成完整的备份集。
关于备份工具1061更详细地描述,可以参考下述方法实施例中的相关描述,在此不再详细赘述。
在一种可能的实现方式中,备份工具1061可以不为数据库106的子模块,可以为一个独立的模块。
需要说明的是,图3所示的软件结构只是示例性说明,并不对其构成限定。在本申请另一些实施例中,图3所示的软件结构可以包括比图示更多或更少的软件模块。
基于上述系统架构,请参阅图5,图5是本申请实施例公开的一种数据库物理备份方法的流程示意图。如图5所示,其为单缓冲区设置下的数据库物理备份的处理流程,该处理方法可以由中央处理器CPU执行,具体可以是处理器调用备份工具中的程序来执行,该处理流程可以包括但不限于如下步骤:
501.CPU打开原数据库目录下的第一数据文件。
其中,第一数据文件可以为存储在主数据盘中,通过原数据库目录可查找到的任一未备份的数据文件。
当进行数据库物理备份时,CPU可以依次对原数据库目录下的数据文件进行备份,生成对应的备份文件,存储到备份目录对应的备份数据盘中。示例性的,CPU可以先打开原数据库目录下的任一未备份的数据文件,以便于读取其中的数据页。
502.当第一缓冲区处于空闲状态时,CPU从第一数据文件中依次读取数据页,并将读取到的数据页依次存入第一缓冲区中。
为了使得硬件加速设备的性能得到充分利用,电子设备的硬件加速设备可以以缓冲区为单位进行批量数据页的压缩。因此,在单缓冲区的情况下,CPU可以先判断第一缓冲区是否处于空闲状态。如果第一缓冲区处于非空闲状态,也就是处于忙碌状态,CPU可以等待第一缓冲区空闲。如果第一缓冲区处于空闲状态,CPU可以从第一数据文件中依次读取数据页,并将读取到的数据页依次存入第一缓冲区中,直到存满第一缓冲区。应理解,数据文件中的数据页之间可以存在顺序,并且,每个数据页可以包括一个标识,如逻辑地址、数据页的编号等。因此,在一种可能的实现方式中,从第一数据文件中依次读取数据页可以理解为按照数据页的顺序依次读取,如按照逻辑地址的先后顺序依次读取连续的数据页。其中,初始状态下,第一缓冲区默认为空闲状态。第一缓冲区的大小可以为数据页大小的N倍,N为大于或等于2的整数。例如,N可以为8、16等。示例性的,当数据页大小为8KB时,第一缓冲区的大小可以为64KB、128KB等。应理解,第一缓冲区可以为在电子设备的内存中开辟的一块存储区域,可以用于电子设备在进行数据库物理备份过程中暂时存储需要进行压缩的数据页。
应理解,在最后一次读取第一数据文件中的剩余数据页时,可能由于第一数据文件中的剩余未读取的数据页的数量较少,不足以存满第一缓冲区,这种情况下,可以将第一数据文件中剩余未读取的数据页均存入第一缓冲区,然后可以视为第一缓冲区存满。
在一种可能的实现方式中,CPU将读取数据页依次存入第一缓冲区之前,可以先对读取数据页依次进行校验。这里的校验包括两种情况,一种情况为:针对每一个数据页
计算校验码,这种情况下,CPU针对每一个数据页计算校验码之后,可以将数据页和对应的校验码一起存入第一缓冲区中(如将校验码附加在对应的未备份数据页的末尾),以便之后基于备份集恢复数据时可以对数据的准确性进行校验。另一种情况为:针对每一个数据页计算校验码,并与之前计算的校验码进行比对,判断当前读取的数据页是否准确。这种情况下,CPU针对每一个数据页计算校验码之后,可以将当前计算得到的校验码与数据页之前计算得到的校验码进行(如之前将该数据页存储到主数据盘时计算得到的校验码)比对,如果相同,则表明当前读取的未备份的数据页是正确的,可以将其存入第一缓冲区中,如果不相同,则表明当前读取的未备份的数据页是错误的,可以跳过这个数据页,或者可以进行数据恢复。
应理解,校验码通常是基于原始数据通过某种运算得出,用以检验原始数据的正确性。常用的校验码可以包括奇偶校验码、海明校验码、循环冗余校验(Cyclic Redundancy Check,CRC)码等。
503.当第一缓冲区存满时,CPU将第一缓冲区的状态变更为忙碌状态,并通过硬件加速设备压缩第一缓冲区中的多个数据页,可以得到对应的备份数据包。
随着CPU持续向第一缓冲区中存数据页,第一缓冲区会存满,此时,为了避免CPU继续向第一缓冲区写入数据,CPU可以将第一缓冲区的状态变更为忙碌状态。并且,CPU可以通知硬件加速设备对第一缓冲区中的数据页进行压缩处理,当硬件加速设备对第一缓冲区中的数据页进行压缩处理之后,可以得到对应的备份数据包,该备份数据包中包括通过硬件加速设备压缩第一缓冲区中的数据页得到的对应的备份数据包。其中,第一缓冲区中的数据页对应的备份数据包也可以理解为第一缓冲区中的数据页对应的压缩数据。
CPU通过硬件加速设备压缩第一缓冲区中的多个数据页具体可以包括:CPU可以调用硬件加速设备提供的压缩API(application programming interface,应用程序编程接口),压缩API会相应地调用硬件加速设备的驱动程序,硬件加速设备的驱动程序会驱动硬件加速设备压缩第一缓冲区中的多个数据页。其中,CPU调用硬件加速设备提供的压缩API时,可以提供相应地的参数,如第一缓冲区的地址等,以便硬件加速设备压缩第一缓冲区中的多个数据页。
504.CPU将对应的备份数据包写入备份目录对应的备数据盘中,并将第一缓冲区的状态变更为空闲状态。
当硬件加速设备压缩完第一缓冲区中的多个数据页之后,硬件加速设备可以通知CPU第一缓冲区压缩完成。之后,CPU可以将压缩第一缓冲区中的多个数据页,得到的对应的备份数据包写入备份目录对应的备数据盘中。并且,CPU可以将第一缓冲区的状态变更为空闲状态,以便可以重新写入待备份的数据页,也即是剩余未读取的数据页。需要说明的是,由于硬件加速设备压缩完第一缓冲区中的数据之后,可以将压缩得到的备份数据包存储在第一缓冲区中,也可以存储在其它空闲的缓冲区中,而针对这两种不同的情况,第一缓冲区的变更为空闲状态的时间可以不同。如果将压缩得到的备份数据包存储在第一缓冲区中,则需要等待CPU将备份数据包写入备份目录对应的备数据盘中,在备份数据包写入备份目录对应的备数据盘之后,可以将第一缓冲区的状态变更为空闲状态。而如果将压缩得到的备份数据包存储在其它空闲的缓冲区中,可以在硬件加速设备压缩完第一缓冲区的数据页之后,将第一缓冲区的状态变更为空闲状态,不需要等待
备份数据包写入备份目录对应的备数据盘中。
可以理解的是,由于原数据库目录下的数据文件和备份目录下的备份文件是一一对应的,因此,CPU将对应的备份数据包写入备份目录下,其实是写入备份目录下与第一数据文件对应的备份文件中。
505.CPU判断第一数据文件是否备份完,在未备份完的情况下,执行步骤502,在备份完的情况下,执行步骤506。
CPU将第一数据文件中的多个数据页对应的备份数据包写入备份目录对应的备数据盘中之后,可以判断第一数据文件是否备份完,也就是判断第一数据文件中是否还包括未备份的数据页或未读取的数据页,如果第一数据文件还未备份完(第一数据文件中还包括未备份的数据页),CPU可以继续读取第一数据文件中未备份的数据页,可以执行步骤502,如果第一数据文件已经备份完(第一数据文件中不包括未备份的数据页),CPU可以继续判断原数据库目录下是否还包括未备份的数据文件,可以执行步骤506。
506.CPU判断原数据库目录下是否包括未备份的数据文件,在原数据库目录下包括未备份的数据文件的情况下,可以执行步骤501,在原数据库目录下不包括未备份的数据文件的情况下,可以执行步骤507。当第一数据文件备份完时,CPU可以判断原数据库目录下是否还包括未备份的数据文件,如果原数据库目录下包括未备份的数据文件,CPU可以继续打开原数据库目录下任一未备份的数据文件,可以执行步骤501。如果原数据库目录下不包括未备份的数据文件,也就是说,原数据库目录下的所有数据文件都已经备份完,此时,CPU可以执行步骤507。
需要说明的是,由于不同的业务需求,针对同一个第一数据文件,可能存在多次备份的情况,例如,先将第一数据文件备份到第一备数据盘中,再将第一数据文件备份到第二备数据盘中,因此,此时在判断一个第一数据文件是否为未备份的数据文件,或者判断一个数据页是否为未备份的数据页时,可能由于上述原因而存在不同的判断结果,基于此,本申请实施例针对上述步骤502-507中,CPU判断第一数据文件中是否包括未备份的数据页中的未备份的数据页是指在本次备份过程中,例如本次执行的步骤501-507的过程中还没有进行过备份的数据页,以及CPU在判断原数据库目录下是否还包括未备份的数据文件,是指在本次备份过程中,例如本次执行的步骤501-507的过程中还没有进行过备份的数据文件。
需要说明的是,备数据盘和主数据盘可以为同一个物理硬盘,也可以为不同的物理硬盘。
507.CPU结束数据库的物理备份,释放相关资源。
当原数据库目录下的所有数据文件均备份完时,备份目录下可以包括完整的备份文件,CPU可以结束数据库的物理备份,并释放相关资源,如释放第一缓冲区等内存资源。
需要说明的是,上述步骤501~步骤507之间的顺序并不是唯一的,某些步骤可以并行执行,如步骤504和步骤505可以并行执行,并且,上述步骤的执行顺序可以根据实际情况进行调整。例如,在一种可能的实现方式中,步骤506可以最先执行,当确定原数据库目录下不包括未备份的数据文件时,CPU可以执行步骤507,当确定原数据库目录下包括未备份的数据文件时,CPU可以执行步骤501,以及接下来的步骤502~步骤505,在执行步骤505时,如果确定第一数据文件已经备份完,则可以再执行步骤506,继续判断是否原数据库目录下是否还包括未备份的数据文件。再例如,上述步骤501~步骤507
可以拆分为两个独立的执行流程,一个负责向第一缓冲区中写未备份的数据页,一个负责压缩第一缓冲区中的数据页,这两个处理流程可以并行执行,并且可以通过第一缓冲区的状态关联起来。如图6所示,其可以为向第一缓冲区中写数据页的处理流程,在向第一缓冲区中写数据页时,CPU可以检测第一缓冲区是否处于空闲状态,当第一缓冲区处于空闲状态的情况下,CPU可以向第一缓冲区持续的写入未备份的数据页。再如图7所示,其可以为通过硬件加速设备压缩第一缓冲区中的数据页的处理流程,在压缩第一缓冲区中的数据页时,CPU可以检测第一缓冲区是否处于忙碌状态,当第一缓冲区处于忙碌状态的情况下,也即是第一缓冲区存满的情况下,CPU可以通过硬件加速设备对第一缓冲区中的数据页进行压缩处理,并将压缩得到的备份数据包写入备份目录对应的备数据盘中。图6和图7中的相关步骤可以参考上述步骤501-步骤507中的相关描述,在此不再详细赘述。
上述处理流程中,CPU在进行数据库的物理备份时,可以通过硬件加速设备以缓冲区为单位进行批量数据页的压缩,可以极大地提高数据压缩的效率,从而可以提高数据库物理备份的速度。
图5为单缓冲区情况下的数据库物理备份流程,但在实际情况下,在数据库物理备份时,也可以采用多缓冲区,这样,可以使得数据读取与压缩并行进行,或者可以使得数据读取、校验与压缩并行进行,从而可以进一步提高数据库物理备份的速度。下面结合图8和图9对多个缓冲区设置下的数据库物理备份流程进行示例性说明。其中,图8可以为多缓冲区设置下,向处于空闲状态的缓冲区中写入未备份的数据页的流程,图9可以为多缓冲区设置下,通过硬件加速设备压缩处于忙碌状态的缓冲区中的数据页的流程。
下面先对图8所示的处理流程进行介绍,如图8所示,该处理流程可以由中央处理器CPU执行,具体可以是处理器调用备份工具中的程序来执行,该处理流程可以包括但不限于如下步骤:
801.CPU打开原数据库目录下的第一数据文件。
其中,第一数据文件可以为第一数据文件可以为存储在主数据盘中,通过原数据库目录可查找到的任一未备份的数据文件。
步骤801与步骤501类似,可以参考上述步骤501中的相关描述。
802.当电子设备的多个缓冲区中存在处于空闲状态的缓冲区时,CPU从第一数据文件中依次读取数据页,并将读取的数据页依次存入第二缓冲区中。
其中,第二缓冲区可以为多个缓冲区中处于空闲状态的任一缓冲区。
在多个缓冲区的情况下,CPU可以先判断多个缓冲区中是否存在处于空闲状态的缓冲区。如果多个缓冲区中不存在处于空闲状态的缓冲区,CPU可以等待多个缓冲区空闲。如果多个缓冲区中存在处于空闲状态的缓冲区,CPU可以从第一数据文件中依次读取数据页,并将读取的数据页依次存入第二缓冲区中,直到存满第二缓冲区。其中,初始状态下,多个缓冲区默认为空闲状态。应理解,这多个缓冲区可以是从内存空间中开辟的用于存储待备份数据页的缓冲区。
应理解,在最后一次读取第一数据文件中的剩余数据页时,可能由于第一数据文件中的剩余未读取的数据页的数量较少,不足以存满第二缓冲区,这种情况下,可以将第
一数据文件中剩余未读取的数据页全部存入第二缓冲区,并且,可以视为第二缓冲区存满。
在一种可能的实现方式中,CPU将读取的未备份的数据页依次存入第二缓冲区之前,可以先对读取的未备份的数据页依次进行校验。关于数据页校验的描述,可以参考步骤501中的相关描述,在此不再赘述。
803.当第二缓冲区存满时,CPU将第二缓冲区的状态变更为忙碌状态。
随着CPU持续向第二缓冲区中存数据页,第二缓冲区会存满,此时,为了避免CPU继续向第二缓冲区写入数据,CPU可以将第二缓冲区的状态变更为忙碌状态。并且,CPU可以通知硬件加速设备对第二缓冲区中的数据页进行压缩处理。
804.CPU判断第一数据文件是否备份完,在未备份完的情况下,执行步骤802,在备份完的情况下,执行步骤805。
CPU存满第二缓冲区之后,可以判断第一数据文件是否备份完,也就是判断第一数据文件中是否还包括未备份的数据页,如果第一数据文件还未备份完,CPU可以继续读取第一数据文件中未备份的数据页,可以执行步骤802,如果第一数据文件已经备份完,CPU可以继续判断原数据库目录下是否还包括未备份的数据文件,可以执行步骤805。
805.CPU判断原数据库目录下是否包括未备份的数据文件,在原数据库目录下包括未备份的数据文件的情况下,可以执行步骤801,在原数据库目录下不包括未备份的数据文件的情况下,可以执行步骤806。
当第一数据文件备份完时,CPU可以判断原数据库目录下是否还包括未备份的数据文件,如果原数据库目录下包括未备份的数据文件,CPU可以继续打开原数据库目录下任一未备份的数据文件,可以执行步骤801。如果原数据库目录下不包括未备份的数据文件,也就是说,原数据库目录下的所有数据文件都已经备份完,此时,CPU可以执行步骤806。
806.CPU结束数据库的物理备份,释放相关资源。
当原数据库目录下的所有数据文件均备份完时,备份目录下可以包括完整的备份文件,CPU可以结束数据库的物理备份,并释放相关资源,如释放第二缓冲区等内存资源。
下面对图9所示的处理流程进行介绍,如图9所示,该处理流程可以由中央处理器CPU执行,具体可以是处理器调用备份工具中的程序来执行,该处理流程可以包括但不限于如下步骤:
901.当电子设备的多个缓冲区中存在处于忙碌状态的缓冲区时,CPU通过硬件加速设备压缩第三缓冲区中的多个数据页,可以得到对应的备份数据包。
其中,第三缓冲区可以为多个缓冲区中处于忙碌状态的任一缓冲区。
从图8所示可见,在未备份完数据库的所有数据文件时,CPU可以持续向多个缓冲区中的空闲缓冲区中写入待备份的数据页。并且,在某个缓冲区存满时,CPU可以通知硬件加速设备对该缓冲区中的数据页进行压缩处理。因此,当CPU的多个缓冲区中存在处于忙碌状态的缓冲区时,CPU可以通过硬件加速设备对第三缓冲区中的多个数据页进行压缩处理,可以得到对应的备份数据包。例如,假设包括3个缓冲区,分别为缓冲区a、缓冲区b和缓冲区c,其中缓冲区a和缓冲区b处于忙碌状态,此时,CPU可以通过硬件加速设备压缩缓冲区a中的多个数据页,再压缩缓冲区b中的多个数据页。
需要说明的是,在一种可能的实现方式中,当存在多个处于忙碌状态的缓冲区时,硬件加速设备可以并行地对这多个处于忙碌状态的缓冲区中的数据页进行压缩处理,从而可以提高数据库物理备份的整体速度。或者,如果CPU包括多个硬件加速设备,这种情况下,当存在多个处于忙碌状态的缓冲区时,CPU可以分别通过多个硬件加速设备对不同的缓冲区中的数据页进行压缩处理,从而可以提高数据库物理备份的整体速度。
902.CPU将对应的备份数据包写入备份目录对应的备数据盘中,并将第三缓冲区的状态变更为空闲状态。
当硬件加速设备压缩完第三缓冲区中的多个数据页之后,硬件加速设备可以通知CPU第三缓冲区压缩完成。之后,CPU可以将压缩第三缓冲区中的多个数据页,得到的对应的备份数据包写入备份目录对应的备数据盘中。并且,CPU可以将第三缓冲区的状态变更为空闲状态,以便可以重新写入待备份的数据页。
在上述图8和图9所示的数据库物理备份的过程中,数据读取、校验与压缩可以并行进行。例如,假设包括缓冲区a、缓冲区b和缓冲区c,且初始状态均为空闲状态。在数据库物理备份时,CPU可以从原数据库目录下读取未备份的数据页并进行校验,然后将校验之后的数据页存入缓冲区a中,当缓冲区a存满之后,可以将缓冲区a的状态变更为忙碌状态,并且,可以通知硬件加速设备对缓冲区a中的数据页进行压缩处理。在硬件加速设备对缓冲区a中的数据页进行压缩处理的同时,CPU可以继续从原数据库目录下读取未备份的数据页并进行校验,然后将校验之后的数据页存入缓冲区b中,当缓冲区b存满之后,可以将缓冲区b的状态变更为忙碌状态,并且,可以通知硬件加速设备对缓冲区b中的数据页进行压缩处理。这样,之后硬件加速设备压缩完缓冲区a中的数据页之后,可以接着压缩缓冲区b中的数据,不需要等待,从而可以提高数据库物理备份的整体效率。
可以理解的是,从原数据库目录下读取未备份的数据页到存满一个处于空闲状态的缓冲区所需的时间(下述简称T1),与硬件加速设备压缩一个处于忙碌状态的缓冲区,并由CPU将对应的备份数据包写入备份目录对应的数据盘中所需的时间可能不同(下述简称T2)。因此,为了充分利用电子设备包括的硬件加速设备的处理能力,在一些情况下,电子设备可以开启多个线程并行地从原数据库目录下读取未备份的数据页(如多个线程分别读取原数据库目录下不同的数据文件中的未备份数据页),以及将读取的未备份的数据页存入空闲状态的缓冲区中。
在一种可能的实现方式中,开启线程的数量可以根据第一时间、第二时间、硬件加速设备的数量(下述简称M)确定。
示例性的,开启线程的数量可以为表示向上取整。这样,可以使得相同时间内存满的缓冲区的数量可以大于压缩的缓冲区的数量,或者存满的缓冲区的数量可以等于压缩的缓冲区的数量(也即是缓冲区的存储与缓冲区的压缩达到平衡),从而可以最大化的利用电子设备的硬件加速设备,进而可以进一步提高数据库物理备份的速度。
应理解,为了达到上述最大化的利用电子设备的硬件加速设备进行压缩处理的效果,还需要缓冲区的数量进行支持。其中,缓冲区的数量可以基于线程的数量、硬件加速设备的数量确定。示例性的,缓冲区的数量可以为线程的数量+硬件加速设备的数量,也即是这样,可以保证缓冲区的数量足够,开启的多个线程都有对应的
空闲缓冲区可以写入未备份的数据页,并且,也可以保证,电子设备包括的硬件加速设备都有忙碌的缓冲区可以进行压缩处理。需要说明的是,针对于这种情况下的缓冲区数量设置,对于每个缓冲区来说,其可以存储硬件加速设备压缩该缓冲区中的数据得到的对应的备份数据包。也就是说,一个缓冲区中的待备份的数据页和该待备份的数据页对应的备份数据包可以通过同一个缓冲区存储。
举例说明,假设T1为4s(秒),也就是说一个线程从原数据库目录下读取未备份的数据页到存满一个处于空闲状态的缓冲区所需的时间为4s,其中包括数据页的校验等时间。T2为2s,也就是说一个硬件加速设备压缩一个处于忙碌状态的缓冲区,并由CPU将对应的备份数据写入备份目录对应的数据盘中所需的时间为2s。硬件加速设备的数量为2,分别为硬件加速设备1和硬件加速设备2。此时,可以计算得到线程的数量为4/2*2个,也即是4个线程,分别为线程1-线程4。缓冲区的数量可以为4+2,也即是6个缓冲区,分别为缓冲区1-缓冲区6。初始状态下,6个缓冲区可以均为空,之后,线程1-线程4可以分别并行的向缓冲区1-缓冲区4写入未备份的数据页,在4s时,缓冲区1-缓冲区4可以被存满。之后,在4s~6s之间,硬件加速设备1和硬件加速设备2可以分别对缓冲区1和缓冲区2进行压缩处理,并且,CPU可以将对应的备份数据包写入备份目录对应的数据盘中,同时,线程1和线程2可以分别向缓冲区5和缓冲区6写入未备份的数据页。在6s时,硬件加速设备1和硬件加速设备2可以分别压缩完缓冲区1和缓冲区2中的数据页。在6s~8s之间,硬件加速设备1和硬件加速设备2可以分别对缓冲区3和缓冲区4进行压缩处理,并且,CPU可以将对应的备份数据包写入备份目录对应的数据盘中,同时,线程1和线程2可以分别继续向缓冲区5和缓冲区6写入未备份的数据页,线程3和线程4可以分别向缓冲区1和缓冲区2写入未备份的数据页。在8s时,硬件加速设备1和硬件加速设备2可以分别压缩完缓冲区3和缓冲区4中的数据页,并且,可以由CPU将对应的备份数据包写入备份目录对应的数据盘中,同时,线程1和线程2可以分别存满缓冲区5和缓冲区6。在8s~10s之间,硬件加速设备1和硬件加速设备2可以分别对缓冲区5和缓冲区6进行压缩处理,并且,可以由CPU将对应的备份数据包写入备份目录对应的数据盘中,同时,线程3和线程4可以分别继续向缓冲区1和缓冲区2写入未备份的数据页,线程1和线程2可以分别向缓冲区3和缓冲区4写入未备份的数据页。在10s时,硬件加速设备1和硬件加速设备2可以分别压缩完缓冲区5和缓冲区6中的数据页,并且,可以由CPU将对应的备份数据包写入备份目录对应的数据盘中,同时,线程3和线程4可以分别存满缓冲区1和缓冲区2。可见,在数据库物理备份过程中,线程1-线程4可以并行地分别向4个处于空闲状态的缓冲区写入数据页,硬件加速设备1和硬件加速设备2可以并行地分别压缩两个处于忙碌状态的缓冲区。并且,当硬件加速设备1或硬件加速设备2压缩完1个处于忙碌状态的缓冲区,并由CPU将对应的备份数据包写入备份目录对应的数据盘后,总会有处于忙碌状态的缓冲区可以供其继续进行压缩处理,这样,可以使得硬件加速设备1和硬件加速设备2的处理资源得到充分利用。
应理解,上述T1还与缓冲区的大小、CPU的负载等有关,上述T2还与缓冲区的大小、硬件加速设备的负载、硬件加速设备本身的处理能力等有关。例如,在缓冲区的大小、硬件加速设备本身的处理能力等确定的情况下,如果数据库物理备份之外的任务占用的CPU资源越多,CPU的负载可以越大,从而会导致用于数据库物理备份的CPU资
源减少,进而会导致上述T1相应的变大。同理,如果硬件加速设备还需要处理其它的任务,T2也会相应的变化。其它任务占用的硬件加速设备的处理资源越多,硬件加速设备的负载可以越大,相应地,T2可以越大。
由于在数据库物理备份过程中,T1和T2可能会变化,因此,电子设备可以动态地调整线程的数量和缓冲区的数量。示例性的,在数据库物理备份过程中,电子设备可以动态监测或预测T1和T2的大小,之后,可以在T1或T2变化时,动态地调整开启的线程数量和缓冲区数量。例如,假设硬件加速设备的数量为2,当前T1为4s,T2为2s,此时,电子设备可以开启4个线程进行缓冲区的存储,以及可以设置6个缓冲区。之后,如果CPU的负载较高,T1可以变为2s,T2不变,此时,电子设备可以确定开启2个线程、4个缓冲区较为合适,因此,电子设备可以关闭2个线程,以及可以释放2个缓冲区的资源,这样,在保证数据库物理备份效率的基础上,可以减少CPU资源和内存资源的占用。
可见,在数据库物理备份过程中,电子设备动态调整线程的数量和缓冲区的数量,可以使得缓冲区的存储和缓冲区的压缩保持平衡的状态,也就是说,在相同时间内,存满的缓冲区的数量和压缩的缓冲区的数量几乎相等。这样,可以使得硬件加速设备的性能得到充分利用,并且,缓冲区的数量和线程的数量较为合适,可以避免占用过多的CPU资源和过多的内存资源。
基于上述系统架构,请参阅图10,图10是本申请实施例公开的又一种数据库物理备份方法的流程示意图,该备份方法可以通过电子设备的中央处理器CPU执行,该电子设备可以包括中央处理器、内存和硬件加速设备,该内存可以包括第一缓冲区。如图10所示,该处理流程可以包括但不限于如下步骤:
1001.CPU读取第一数据文件中的第一数据,并将第一数据存入第一缓冲区。
其中,第一数据文件存储在原数据库目录对应的数据盘,如上述主数据盘。第一数据可以为第一数据文件中的未备份的数据页。示例性的,第一缓冲区的大小可以为数据页大小的N倍,N为大于或等于2的整数。
示例性的,CPU可以先打开第一数据文件,之后,如果第一缓冲区没有存入或者没有存满第一数据,这种情况下,第一缓冲区还可以继续存储数据,此时,CPU可以读取第一数据文件中的第一数据,并将读取的第一数据存入第一缓冲区中。应理解,第一缓冲区没有存入或者没有存满第一数据可以理解为第一缓冲区处于空闲状态,可以参考上述图5所示实施例中的相关描述。还需要说明的是,当第一缓冲区中的第一数据压缩完成,得到对应的压缩数据之后,可以将对应的压缩数据也存入第一缓冲区中,之后,可以将第一缓冲区中存储的压缩数据写入备份目录对应的数据盘中。之后,当将第一缓冲区中存储的压缩数据成功写入备份目录对应的数据盘中之后,第一缓冲区可以视为空的缓冲区,也即是没有存入第一数据的缓冲区。
在一种可能的实现方式中,CPU将读取的第一数据存入第一缓冲区之前,可以校验读取的第一数据,之后,在第一数据校验成功的情况下,CPU再将第一数据页存入第一缓冲区中。
1002.CPU通过硬件加速设备压缩第一缓冲区中的第一数据,生成第一数据对应的备份数据。
在一种可能的实现方式中,在第一缓冲区中存满第一数据的情况下,CPU通过硬件加速设备压缩第一缓冲区中的第一数据。这样,可以保证缓冲区空间的利用率。
在一种可能的实现方式中,第一缓冲区可以为多个。这种情况下,CPU可以从第一数据文件中读取剩余数据,也即是读取第一数据文件中剩余未读取的数据页或待备份的数据页,并将读取的剩余数据存入第二缓冲区中。其中,第二缓冲区可以为多个第一缓冲区中的其余任意一个没有存入或者没有存满第一数据的缓冲区。当硬件加速设备完成压缩一个第一缓冲区中的第一数据,通过硬件加速设备压缩第二缓冲区中的数据。也就是说,在包括多个缓冲区的情况下,硬件加速设备在压缩一个第一缓冲区中的数据的同时,CPU可以从第一数据文件中继续读取剩余数据,并将剩余数据存入另外的第一缓冲区中,可以保证CPU向缓冲区中写入数据与硬件加速设备压缩数据并行,从而可以提高数据备份效率。
在一种可能的实现方式中,可以通过多个线程从原数据库目录下多个数据文件中分别读取第一数据,并将读取的不同数据文件中的第一数据分别存入多个第一缓冲区没有存入或者没有存满所述第一数据的缓冲区中;当第三缓冲区存满的情况下,通过硬件加速设备压缩第三缓冲区中的数据。其中,第三缓冲区可以为多个第一缓冲区中任一存满的缓冲区。
应理解,上述多个数据文件中的第一数据可以为不同的数据。
在一种可能的实现方式中,当多个第一缓冲区中存在多个存满的缓冲区时,CPU可以通过多个硬件加速设备处理并行压缩多个存满的缓冲区中的数据。
示例性的,开启的线程的数量和缓冲区的数量可以根据第一时间、第二时间、电子设备包括的硬件加速设备的数量确定。其中,第一时间可以为从原数据库目录下读取第一数据到存满一个第一缓冲区所需的时间,第二时间为一个硬件加速设备完成压缩一个存满的第一缓冲区,并由CPU将对应的备份数据写入备份目录对应的数据盘中所需的时间。
在一种可能的实现方式中,电子设备的硬件加速设备设置于该电子设备的处理器中,也就是说,电子设备的硬件加速设备与电子设备的处理器可以集成在一起。
可以理解的是,在数据备份过程中,第一时间、第二时间可能会变化。因此,在数据备份过程中,CPU可以根据第一时间、第二时间、电子设备包括的硬件加速设备的数量动态调整线程的数量和/或缓冲区的数量。
1003.CPU将对应的备份数据写入备份目录对应的数据盘中。
上述步骤1001-步骤1003可以参考图5、图8和图9中相关步骤的描述,在此不再赘述。
需要说明的是,上述不同实施例中的相关信息(即相同信息或相似信息)和相关描述可以相互参考。
上述步骤501-507,601-606,701-702,801-806,901-902,1001-1003中的执行主体除了为电子设备中的中央处理器CPU之外,还可以是电子设备中的其他控制器或者处理芯片,如CPLD,FPGA等控制器。
应理解,上述图5、图6、图7、图8、图9和图10中以CPU作为交互示意的执行主体为例来示意上述处理流程,但本申请并不限制该交互示意的执行主体。例如,图5、
图6、图7、图8、图9和图10中的CPU也可以是支持该CPU实现该方法的芯片、芯片系统、电子设备等,还可以是能实现全部或部分CPU功能的逻辑模块或软件(如上述备份工具)。
本申请实施例还公开一种电子设备,该电子设备包括存储器和处理器,存储器用于存储电子设备的计算机程序或计算机指令,处理器可以用于读取存储器中存储的程序或计算机指令,执行上述方法实施例中的方法。上述存储器可以包括但不限于是随机存储记忆体(random access memory,RAM)、只读存储器(read-only memory,ROM)、可擦除可编程只读存储器(erasable programmable read only memory,EPROM)或便携式只读存储器(compact disc read-only memory,CD-ROM)等。上述处理器可以是CPU、复杂可编程逻辑器件、通用处理器、数字信号处理器、专用集成电路、现场可编程门阵列或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,数字信号处理器和微处理器的组合等等。
本申请实施例还公开一种计算机可读存储介质,其上存储有指令,该指令被执行时执行上述方法实施例中的方法。
本申请实施例还公开一种包括指令的计算机程序产品,该指令被执行时执行上述方法实施例中的方法。
显然,上述所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或者特性可以包含在本实施例申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是相同的实施例,也不是与其它实施例互斥的独立的或是备选的实施例。本领域技术人员可以显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。本申请的说明书和权利要求书及所述附图中术语“第一”、“第二”、“第三”等是区别于不同的对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。例如,包含了一系列步骤或单元,或者可选地,还包括没有列出的步骤或单元,或者可选地还包括这些过程、方法、产品或设备固有的其它步骤或单元。可以理解的是,上述条件判断的等号可以取大于一端也可以取小于一端,例如,上述对于一个阈值大于、小于或等于的条件判断,也可以改为对该阈值大于或等于、小于的条件判断,在此不作限定。
可以理解的是,附图中仅示出了与本申请相关的部分而非全部内容。应当理解的是,一些示例性实施例被描述成作为流程图描绘的处理或方法。虽然流程图将各项操作(或步骤)描述成顺序的处理,但是其中的许多操作可以并行地、并发地或者同时实施。此外,各项操作的顺序可以被重新安排。当其操作完成时所述处理可以被终止,但是还可以具有未包括在附图中的附加步骤。所述处理可以对应于方法、函数、规程、子例程、子程序等等。
在本说明书中使用的术语“部件”、“模块”、“系统”、“单元”等用于表示计算机相关的实体、硬件、固件、硬件和软件的组合、软件或执行中的软件。例如,单元可以是但不限于在处理器上运行的进程、处理器、对象、可执行文件、执行线程、程序
和/或分布在两个或多个计算机之间。此外,这些单元可从在上面存储有各种数据结构的各种计算机可读介质执行。单元可例如根据具有一个或多个数据分组(例如来自与本地系统、分布式系统和/或网络间的另一单元交互的第二单元数据。例如,通过信号与其它系统交互的互联网)的信号通过本地和/或远程进程来通信。
以上所述的具体实施方式,对本申请的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本申请的具体实施方式而已,并不用于限定本申请的保护范围,凡在本申请的技术方案的基础之上,所做的任何修改、等同替换、改进等,均应包括在本申请的保护范围之内。
Claims (10)
- 一种数据备份方法,其特征在于,所述方法应用于电子设备,所述电子设备包括中央处理器、内存和硬件加速设备,所述内存包括第一缓冲区;所述方法包括:读取第一数据文件中的第一数据,并将所述第一数据存入所述第一缓冲区,其中,所述第一数据文件存储在原数据库目录对应的数据盘中;通过所述硬件加速设备压缩所述第一缓冲区中的第一数据,以生成所述第一数据对应的备份数据;将所述备份数据写入备份目录对应的数据盘中。
- 根据权利要求1所述的方法,其特征在于,所述通过所述硬件加速设备压缩所述第一缓冲区中的第一数据,包括:在所述第一缓冲区中存满所述第一数据的情况下,通过所述硬件加速设备压缩所述第一缓冲区中的第一数据。
- 根据权利要求1或2所述的方法,其特征在于,所述第一缓冲区为多个,所述方法还包括:从所述第一数据文件中读取剩余数据,并将所述剩余数据存入所述第二缓冲区中;所述第二缓冲区为所述多个第一缓冲区中其余任意一个没有存入或者存满第一数据的缓冲区;当所述硬件加速设备完成压缩所述第一缓冲区中的第一数据,通过所述硬件加速设备压缩所述第二缓冲区中的数据。
- 根据权利要求3所述的方法,其特征在于,所述方法还包括:通过多个线程从所述原数据库目录下多个数据文件中分别读取第一数据,并将读取的不同数据文件中的第一数据分别存入多个第一缓冲区没有存入或者没有存满所述第一数据的缓冲区中;当第三缓冲区存满的情况下,通过所述硬件加速设备压缩所述第三缓冲区中的数据,所述第三缓冲区为所述多个第一缓冲区中任一存满的缓冲区。
- 根据权利要求3或4所述的方法,其特征在于,所述电子设备包括多个硬件加速设备,所述方法还包括:当所述多个第一缓冲区中存在多个存满的缓冲区时,通过所述多个硬件加速设备处理并行压缩所述多个存满的缓冲区中的数据。
- 根据权利要求4或5所述的方法,其特征在于,所述线程的数量根据第一时间、第二时间、所述电子设备包括的硬件加速设备的数量确定;所述第一时间为从原数据库目录下读取第一数据到存满一个第一缓冲区所需的时间,所述第二时间为一个硬件加速设备完成压缩一个存满的第一缓冲区,并由所述中央处理器将对应的备份数据写入备份目录对应的数据盘中所需的时间。
- 根据权利要求3-6任一项所述的方法,其特征在于,所述缓冲区的数量根据第一时间、第二时间、所述电子设备包括的硬件加速设备的数量确定;所述第一时间为从原数据库目录下读取第一数据到存满一个第一缓冲区所需的时间,所述第二时间为一个硬件加速设备完成压缩一个存满的第一缓冲区,并由所述中央处理器将对应的备份数据写入备份目录对应的数据盘中所需的时间。
- 根据权利要求3-7任一项所述的方法,其特征在于,所述方法还包括:在数据备份过程中,根据所述第一时间、所述第二时间、所述电子设备包括的硬件加速设备的数量动态调整所述线程的数量和/或所述缓冲区的数量。
- 根据权利要求1-8任一项所述的方法,其特征在于,所述中央处理器包括所述硬件加速设备。
- 一种电子设备,其特征在于,所述电子设备包括处理器和存储器,所述处理器调用所述存储器中存储的计算机程序或计算机指令实现如权利要求1-9任一项所述的方法。
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310729914.0 | 2023-06-19 | ||
| CN202310729914.0A CN116954996A (zh) | 2023-06-19 | 2023-06-19 | 数据备份方法、电子设备及计算机可读存储介质 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024259890A1 true WO2024259890A1 (zh) | 2024-12-26 |
Family
ID=88445318
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2023/133236 Ceased WO2024259890A1 (zh) | 2023-06-19 | 2023-11-22 | 数据备份方法、电子设备及计算机可读存储介质 |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN116954996A (zh) |
| WO (1) | WO2024259890A1 (zh) |
Families Citing this family (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN116954996A (zh) * | 2023-06-19 | 2023-10-27 | 超聚变数字技术有限公司 | 数据备份方法、电子设备及计算机可读存储介质 |
| CN118051378A (zh) * | 2024-01-12 | 2024-05-17 | 中建材信息技术股份有限公司 | 一种基于日志的数据库增量备份方法 |
| CN120276940B (zh) * | 2025-06-06 | 2026-02-24 | 宁德时代润智软件科技有限公司 | 负载监控方法、装置、嵌入式设备、计算机设备和介质 |
| CN121523970A (zh) * | 2025-10-31 | 2026-02-13 | 北京中金国信科技有限公司 | Pci-e密码卡密钥备份和恢复方法及装置 |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8566286B1 (en) * | 2009-05-15 | 2013-10-22 | Idera, Inc. | System and method for high speed database backup using rapidly adjusted dynamic compression ratios controlled by a feedback loop |
| US20210109759A1 (en) * | 2019-10-15 | 2021-04-15 | EMC IP Holding Company LLC | Pipelined method to improve backup and restore performance |
| CN116954996A (zh) * | 2023-06-19 | 2023-10-27 | 超聚变数字技术有限公司 | 数据备份方法、电子设备及计算机可读存储介质 |
-
2023
- 2023-06-19 CN CN202310729914.0A patent/CN116954996A/zh active Pending
- 2023-11-22 WO PCT/CN2023/133236 patent/WO2024259890A1/zh not_active Ceased
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8566286B1 (en) * | 2009-05-15 | 2013-10-22 | Idera, Inc. | System and method for high speed database backup using rapidly adjusted dynamic compression ratios controlled by a feedback loop |
| US20210109759A1 (en) * | 2019-10-15 | 2021-04-15 | EMC IP Holding Company LLC | Pipelined method to improve backup and restore performance |
| CN116954996A (zh) * | 2023-06-19 | 2023-10-27 | 超聚变数字技术有限公司 | 数据备份方法、电子设备及计算机可读存储介质 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN116954996A (zh) | 2023-10-27 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2024259890A1 (zh) | 数据备份方法、电子设备及计算机可读存储介质 | |
| US20230185747A1 (en) | Presentation of direct accessed storage under a logical drive model | |
| US10877940B2 (en) | Data storage with a distributed virtual array | |
| CN103729442B (zh) | 记录事务日志的方法和数据库引擎 | |
| US8856472B2 (en) | Restore in cascaded copy environment | |
| US20120079175A1 (en) | Apparatus, system, and method for data transformations within a data storage device | |
| CN115098046B (zh) | 磁盘阵列初始化方法、系统、电子设备及存储介质 | |
| US10387307B2 (en) | Lock-free raid implementation in multi-queue architecture | |
| US11210024B2 (en) | Optimizing read-modify-write operations to a storage device by writing a copy of the write data to a shadow block | |
| CN112579351A (zh) | 一种云硬盘备份系统 | |
| CN118244972A (zh) | 一种数据存储方法、装置及系统 | |
| WO2024212783A1 (zh) | 数据写入方法、装置、固态硬盘、电子设备及非易失性可读存储介质 | |
| CN117992283A (zh) | 云主机备份方法、装置、计算机设备及存储介质 | |
| CN113849341B (zh) | 一种nas快照的性能优化方法、系统、设备及可读存储介质 | |
| CN117666931A (zh) | 一种数据处理方法及相关设备 | |
| US10185573B2 (en) | Caching based operating system installation | |
| CN111290836A (zh) | 虚拟机快照创建方法、装置、存储介质及计算机设备 | |
| US10942663B2 (en) | Inlining data in inodes | |
| US20210132840A1 (en) | Storage management system and method | |
| WO2024212850A1 (zh) | 元数据备份方法、装置、电子设备及存储介质 | |
| CN119003239A (zh) | 数据处理方法、装置、计算机设备和存储介质 | |
| US10860221B1 (en) | Page write to non-volatile data storage with failure recovery | |
| WO2022028208A1 (zh) | 独立磁盘冗余阵列卡,处理命令的方法,存储装置和系统 | |
| CN114675995A (zh) | 数据备份方法、装置和电子设备 | |
| CN120315832A (zh) | 数据处理方法、装置、计算机设备及存储介质 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23942156 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |