WO2021073635A1 - 一种数据存储方法及装置 - Google Patents

一种数据存储方法及装置 Download PDF

Info

Publication number
WO2021073635A1
WO2021073635A1 PCT/CN2020/121843 CN2020121843W WO2021073635A1 WO 2021073635 A1 WO2021073635 A1 WO 2021073635A1 CN 2020121843 W CN2020121843 W CN 2020121843W WO 2021073635 A1 WO2021073635 A1 WO 2021073635A1
Authority
WO
WIPO (PCT)
Prior art keywords
fingerprint
data
data block
storage medium
speed storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2020/121843
Other languages
English (en)
French (fr)
Inventor
饶知
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to EP20877442.2A priority Critical patent/EP4030273B1/en
Publication of WO2021073635A1 publication Critical patent/WO2021073635A1/zh
Priority to US17/720,479 priority patent/US11886729B2/en
Anticipated expiration legal-status Critical
Priority to US18/534,230 priority patent/US12159046B2/en
Priority to US18/964,996 priority patent/US20250094070A1/en
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • G06F3/0641De-duplication techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0611Improving I/O performance in relation to response time
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0647Migration mechanisms
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0659Command handling arrangements, e.g. command buffers, queues, command scheduling
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0685Hybrid storage combining heterogeneous device types, e.g. hierarchical storage, hybrid arrays

Definitions

  • This application relates to the storage field, and in particular to a data storage method and device.
  • the embodiments of the present application provide a data storage method and device.
  • the solutions of the embodiments of the present application can generate a global fingerprint table for data deduplication in a storage array, thereby improving the data deduplication effect and storage efficiency.
  • an embodiment of the present application provides a data storage method, which is applied to a storage array, the storage array includes a high-speed storage medium and a low-speed storage medium, and the method includes:
  • the fingerprint table including the fingerprint corresponding to the data block stored in the high-speed storage medium and the fingerprint corresponding to the data block stored in the low-speed storage medium;
  • the deduplication operation is: point the data block of the fingerprint in the fingerprint table to the storage address of the data block corresponding to the fingerprint in the fingerprint table, and download the data from the data block to be written. Delete data blocks whose fingerprints exist in the fingerprint table from writing data;
  • the data to be written after the deduplication operation is stored in the high-speed storage medium or the low-speed storage medium.
  • the data blocks stored in the high-speed storage medium and the low-speed storage medium share the same fingerprint table for data deduplication, which prevents the same data blocks from being stored in different storage media. Repetitive generation of fingerprints and repeated storage in different storage media improves storage efficiency and saves storage space.
  • the method further includes:
  • the storage address of the data block whose fingerprint is not in the fingerprint table in the data to be written is added to the fingerprint table.
  • the method further includes:
  • the storage address of the data block corresponding to the fingerprint is the low-speed storage medium, the data block corresponding to the fingerprint is migrated from the low-speed storage medium to the high-speed storage medium ;
  • the method further includes:
  • the storage address of the data block corresponding to the fingerprint is the high-speed storage medium, the data block corresponding to the fingerprint is migrated from the high-speed storage medium to the low-speed storage medium ;
  • the fingerprint table is stored in the high-speed storage medium.
  • an embodiment of the present application provides a device
  • the storage device includes a high-speed storage medium and a low-speed storage medium, and specifically includes:
  • a receiving unit configured to receive a data write request, the data write request carries data to be written, and the data to be written includes at least one data block;
  • a calculating unit configured to calculate a fingerprint of each data block, where the fingerprint is used to uniquely identify each data block
  • the determining unit is configured to determine whether the fingerprint of each data block exists in a fingerprint table, and the fingerprint table includes the fingerprint corresponding to the data block stored in the high-speed storage medium and the data stored in the low-speed storage medium Fingerprint corresponding to the block;
  • the deduplication unit is configured to perform a deduplication operation on the data to be written, and the deduplication operation is: point the data block of the fingerprint in the fingerprint table to the storage address of the data block corresponding to the fingerprint in the fingerprint table , And delete data blocks whose fingerprints exist in the fingerprint table from the data to be written;
  • the storage unit is configured to store the data to be written after the deduplication operation in the high-speed storage medium or the low-speed storage medium.
  • the determining unit is further configured to:
  • the storage address of the data block whose fingerprint is not in the fingerprint table in the data to be written is added to the fingerprint table.
  • the device further includes a migration unit, configured to:
  • the storage address of the data block corresponding to the fingerprint is the low-speed storage medium, the data block corresponding to the fingerprint is migrated from the low-speed storage medium to the high-speed storage medium ;
  • the migration unit is further configured to:
  • the storage address of the data block corresponding to the fingerprint is the high-speed storage medium, the data block corresponding to the fingerprint is migrated from the high-speed storage medium to the low-speed storage medium ;
  • the fingerprint table is stored in the high-speed storage medium.
  • an embodiment of the present application provides a device, including:
  • a memory storing executable program codes
  • a processor coupled with the memory
  • the processor calls the executable program code stored in the memory, so that the device executes any method described in the first aspect.
  • an embodiment of the present application provides a computer-readable storage medium.
  • the computer storage medium includes program instructions that, when run on a computer, cause the computer to execute any of the operations described in the first aspect.
  • FIG. 1 is a schematic diagram of an IT system provided by an embodiment of the application
  • FIG. 2 is a schematic structural diagram of a storage array provided by an embodiment of the application.
  • 3A is a schematic diagram of a foreground deduplication solution based on an all-flash architecture provided by an embodiment of the application;
  • 3B is a schematic diagram of a background deduplication solution based on a mechanical disk architecture provided by an embodiment of the present application
  • FIG. 4 is a schematic diagram of a hierarchical storage solution provided by an embodiment of the application.
  • FIG. 5 is a schematic flowchart of a data storage method provided by an embodiment of the application.
  • FIG. 6 is a schematic diagram of a data storage process provided by an embodiment of the application.
  • FIG. 7 is a schematic diagram of a fingerprint table matching process provided by an embodiment of the application.
  • FIG. 8 is a schematic diagram of a cross-system storage architecture provided by an embodiment of the application.
  • FIG. 9 is a schematic diagram of a process of a data migration method provided by an embodiment of this application.
  • FIG. 10 is a schematic diagram of a data migration process provided by an embodiment of this application.
  • FIG. 11A is a schematic structural diagram of a data storage device provided by an embodiment of the application.
  • 11B is a schematic structural diagram of another data storage device provided by an embodiment of the application.
  • FIG. 12 is a schematic structural diagram of a device provided by an embodiment of this application.
  • FIG. 1 is a schematic diagram of an IT system provided by an embodiment of this application.
  • the system consists of a user terminal 10, an application server 20,
  • the storage array 40, the administrator 30, and the network 50 are composed.
  • the application server 20 may be a virtual server or a physical server, and is responsible for calculating application data and storing the data in a storage array.
  • the storage array 40 is composed of a solid state hard disk, a traditional mechanical disk, or a tape library and other media, and is used for persistent storage of application data.
  • the access protocol between the storage array and the application server can be Common Internet File System (CIFS), Network File System (NFS), Small Computer System Interface (Internet Small Computer System Interface, iSCSI), Fibre Channel (FC) and other protocols.
  • CIFS Common Internet File System
  • NFS Network File System
  • iSCSI Small Computer System Interface
  • FC Fibre Channel
  • FIG. 2 is a schematic structural diagram of a storage array 40 provided by an embodiment of the application.
  • the storage array 40 includes an input device 401, an output device 402, and a central processing unit 403 (CPU). It includes an arithmetic unit and a controller, and is connected to a memory 404 and an external memory 405.
  • the external memory 405 includes a high-speed storage medium and a low-speed storage medium. Among them, high-speed storage media is represented by SSD, and low-speed storage media is represented by hard disk drives (HDD).
  • the storage architecture includes an all-flash architecture, a mechanical disk architecture, and a hierarchical storage architecture.
  • the all-flash architecture only SSDs are allowed, and in the mechanical disk architecture, only hard disk drives (HDD) are included.
  • the hierarchical storage architecture by planning the storage path of the data that needs to be stored at the logical storage layer, the data can be stored in SSD or HDD according to different storage strategies.
  • the storage array 40 in the embodiment of the present application is a hierarchical storage architecture.
  • the data to be written may be deduplicated.
  • the process of deduplication operation is as follows: First, when the storage array receives the data to be written, the data to be written is cut into data blocks of equal length (for the convenience of description, the data blocks included in the data to be written are referred to as "Data block to be written"), the length of the data block can be, for example, 4KB, 8KB or 16KB, etc., and then calculate the characteristic value of each data block to be written, that is, the fingerprint.
  • the fingerprint set of different data blocks constitutes the fingerprint table, in the fingerprint table It also includes the reference count corresponding to each fingerprint and the storage address of the data block represented by the fingerprint.
  • the reference count is used to indicate the number of times the fingerprint is used by different data to be written. After the fingerprint of the data block to be written is calculated, the fingerprint of each data block to be written is compared with the fingerprint in the fingerprint table.
  • the fingerprint of the data block to be written matches the fingerprint stored in the fingerprint table, it indicates that there is The data block to be written is already stored, the data block to be written is deleted from the data to be written, and the position of the deleted data block in the data to be written is pointed to the fingerprint table and The fingerprint matching the fingerprint of the deleted data block is added, and then the reference count of the fingerprint matching the fingerprint of the deleted data block is incremented. If the fingerprint of the data block to be written does not exist in the fingerprint table, then the to-be-written data block is deleted. The fingerprint of the write data block is added to the fingerprint table, and after the data block to be written is stored in the memory, the storage address of the data block to be written in the memory is recorded in the fingerprint table.
  • FIG. 3A is a schematic diagram of a front-end deduplication solution based on an all-flash architecture provided by an embodiment of the application. Since the read and write speed of the SSD in the all-flash architecture is relatively block, the front-end deduplication solution is generally adopted. As shown in FIG. 3A, when the data to be written is written into the high-speed storage medium, the data to be written is deduplicated according to the fingerprint table in the memory, and the deduplicated data is stored in the high-speed storage medium. This process is deduplication performed before the data block is stored in the physical medium, so it is the foreground deduplication.
  • FIG. 3B is a schematic diagram of a background deduplication solution based on a mechanical disk architecture provided by an embodiment of the present application.
  • the read and write speed of HDD is relatively slow, so background deduplication is used to reduce the impact of deduplication on IO.
  • Figure 3B first write the data to be written into the low-speed storage medium, and then deduplicate the written data according to the fingerprint table, and delete the data block of the written data that is the same as the data block corresponding to the fingerprint in the indicator table. . Because this process is deduplication performed after the data is stored in the physical medium, it is a background deduplication.
  • the storage array in the embodiment of the present application includes both SSD and HDD, and both deduplication technologies can be selected.
  • background deduplication deletes data in the background, so there is the hysteresis of data deletion, and the undeleted data is stored in the physical medium first, then the physical medium must reserve additional storage space, which leads to the problem of space enlargement.
  • data deduplication after the data is written to the physical medium will inevitably lead to additional hard disk read and write operations, which will eventually reduce the life of the hard disk. Therefore, considering the performance advantages of SSDs and the disadvantages of background deduplication, the hierarchical storage in this embodiment of the present application selects the foreground deduplication technology. Please refer to Figure 4.
  • FIG. 4 is a schematic diagram of a hierarchical storage solution provided by an embodiment of this application. As shown in Figure 4, it is a hierarchical storage method based on foreground deduplication technology.
  • the storage media includes high-speed storage media and low-speed storage media.
  • Storage medium the data to be written is stored in a high-speed storage medium or a low-speed storage medium according to a write configuration strategy.
  • this hierarchical storage method still has problems: high-speed storage media and low-speed storage media each store a fingerprint table, and the two fingerprint tables cannot perceive each other.
  • Different files to be written are written to high-speed storage media and low-speed storage media respectively according to the write configuration strategy, but different data to be written may include the same data blocks, and the same data blocks are stored in both high-speed storage media and low-speed storage media at the same time. Storage media, causing repeated storage of data blocks. This will undoubtedly reduce storage efficiency and increase storage space consumption.
  • FIG. 5 is a schematic flowchart of a data storage method provided by an embodiment of the application, which is applied to the storage array shown in FIG. 2.
  • the storage array includes a high-speed storage medium and a low-speed storage medium, such as As shown in Figure 5, the method includes the following steps:
  • a storage array receives a data write request, where the data write request carries data to be written, and the data to be written includes at least one data block.
  • the storage array receives the data write request submitted by the application server, and then parses to obtain the data to be written carried in the write request, where the data to be written includes one or more data blocks.
  • LBA Logical Block Address
  • NBA Network Attached Storage
  • the storage array determines that the data stored therein is stored in a high-speed storage medium or a low-speed storage medium according to the write configuration strategy corresponding to these logical storage spaces.
  • the writing configuration strategy can be determined according to the writing frequency of the LBA.
  • the stored data is the frequently modified data, and the writing frequency is greater than the preset frequency.
  • frequency in order to ensure high-frequency write performance requirements, it is determined that the data in the LBA is stored in a high-speed storage medium.
  • the write configuration strategy can be determined according to the file attributes under the file system, for example, the file attribute is "archive file", which can be stored in a low-speed storage medium.
  • the storage array calculates a fingerprint of each data block, where the fingerprint is used to uniquely identify each data block.
  • the fingerprint of each data block is used to uniquely identify the data block, so the fingerprint of the data block can be the hash value obtained by the hash value operation of the data block, or the data in the data block for word count and probability Calculate the fingerprint by methods such as distribution statistics or numerical average calculation.
  • the storage array determines whether the fingerprint of each data block exists in a fingerprint table, and the fingerprint table includes the fingerprint corresponding to the data block stored in the high-speed storage medium and the data block stored in the low-speed storage medium. The corresponding fingerprint.
  • the fingerprint of the data block in the data to be written is compared with the existing fingerprint in the fingerprint table to determine whether the fingerprint of the data block exists in the fingerprint table.
  • FIG. 6 is a schematic diagram of a data storage process provided by an embodiment of the application. As shown in FIG. 6, an embodiment of the application is shown in FIG.
  • the generated fingerprint table is a global fingerprint table, and the global fingerprint table includes all fingerprints corresponding to data blocks stored in a high-speed storage medium and a low-speed storage medium.
  • fingerprint matching is performed with the global fingerprint table to determine whether the data blocks in the data to be written have been stored in a high-speed storage medium or a low-speed storage medium.
  • the global fingerprint table is stored in a high-speed storage medium, so that when the processor reads the fingerprint table to the memory, high efficiency and low latency can be ensured, and data storage efficiency can be improved.
  • the deduplication operation is: point the data block of the fingerprint in the fingerprint table to the storage address of the data block corresponding to the fingerprint in the fingerprint table, and download the data from the data block corresponding to the fingerprint in the fingerprint table.
  • the data block whose fingerprint exists in the fingerprint table is deleted from the data to be written.
  • FIG. 7 is a schematic diagram of a fingerprint table matching process provided by an embodiment of this application.
  • a global fingerprint table is a data block containing multiple data blocks stored in a physical storage medium.
  • the data list of the existing fingerprints, the fingerprints of the data block in the data to be written and the existing fingerprints in the list are sequentially matched, the matching result includes two, the existing fingerprint in the fingerprint table is exactly the same as the fingerprint of the data block, or the fingerprint The existing fingerprint in the table is different from the fingerprint of the data block.
  • the fingerprint in the data block in the global fingerprint table is the same as the existing fingerprint, it means that the data block corresponding to the fingerprint has been stored in a high-speed storage medium or a low-speed storage Once stored in the medium, the data block does not need to be stored again, and the data block is deleted from the data to be written. Conversely, if there is no existing fingerprint that is the same as the fingerprint of the data block in the global fingerprint table, it means that the data block corresponding to the fingerprint has not been stored in a high-speed storage medium or a low-speed storage medium. The data block needs to be reserved for subsequent Stored in physical storage media.
  • the global fingerprint table also includes the storage address of the data block corresponding to the fingerprint.
  • the storage address is the storage address of the data block corresponding to the fingerprint in the high-speed storage medium or the low-speed storage medium.
  • the storage address may be SSD or HDD in this embodiment of the application.
  • the global fingerprint table also includes a reference count corresponding to the fingerprint. The reference count is used to record the number of times that the fingerprint is used for data block deduplication. When the fingerprint is recorded in the fingerprint table for the first time, the reference count is The count is 1, and subsequently, each time the fingerprint is used for a deduplication operation, the number of references is incremented once. For example, fingerprint a corresponds to data block A.
  • fingerprint a is added to the global fingerprint table, and the application count corresponding to fingerprint a is recorded as 1.
  • the fingerprint a is used to perform the deduplication operation on the data to be written, that is, the data block A in the data to be written is deleted, and the data to be written is
  • the data block A of A uses a pointer to point to the physical address corresponding to the fingerprint a in the fingerprint table, such as 1BC, where 1 is the pointer corresponding to the fingerprint a, and the reference count of the fingerprint a is incremented to 2.
  • the fingerprint a is used to deduplicate the data AEF to be written and the result is EF, and the reference count of the fingerprint a is incremented to 3.
  • the data to be written after the deduplication operation includes multiple deduplication data blocks. According to the aforementioned write configuration strategy, it is determined to write the deduplicated data block to the high-speed storage medium or the low-speed storage medium to complete the storage process of the data block.
  • a data compression technology may also be used to compress the data blocks in the data to be written, and then the compressed data blocks are stored in the physical medium.
  • Data compression is to eliminate redundant information in the original data through compression algorithms such as LZ4 to save storage space.
  • Data deduplication and data compression can be used in combination, first deduplication and then compression, can maximize the reduction of data occupation space.
  • FIG. 8 is a schematic diagram of a cross-system storage architecture provided by an embodiment of this application.
  • the high-speed storage medium and the low-speed storage medium are two different independent systems.
  • the fingerprint tables are generated separately in the storage system or storage array, causing the same data block to generate fingerprints in both fingerprint tables, resulting in low deduplication efficiency and duplicate storage problems.
  • a global system can be generated in one of the storage systems or storage arrays.
  • Fingerprint table and all the data to be written in another storage system or storage array is sent to the storage system or storage array including the global fingerprint table for deduplication to obtain deduplicated data blocks, and finally the deduplicated data blocks are sent back to storage Go to its corresponding storage system or storage array.
  • a global fingerprint can be generated in a high-speed storage medium.
  • the global fingerprint table is used to deduplicate and store the data to be written in the high-speed storage medium.
  • the data to be written in the low-speed storage medium is sent to the high-speed storage medium for deduplication, and then the data to be written in the low-speed storage medium
  • the corresponding deduplicated data block can be sent back to the low-speed storage medium for storage.
  • FIG. 9 is a schematic diagram of the process of a data migration method provided by an embodiment of the application. As shown in FIG. 9, after the above-mentioned data deduplication and storage through the global fingerprint table, data migration can also be performed. Including the following steps:
  • the storage array increments the popularity count of the fingerprints of the data blocks according to the number of times the data blocks are read;
  • the storage array will remove the data block corresponding to the fingerprint from the low-speed storage medium.
  • the storage medium is migrated to the high-speed storage medium;
  • the storage array modifies the address corresponding to the fingerprint of the migrated data block in the fingerprint table to the address of the data block in the high-speed storage medium.
  • the storage array For fingerprints whose popularity count is less than the second threshold within a preset time, if the storage address of the data block corresponding to the fingerprint is the high-speed storage medium, the storage array removes the data block corresponding to the fingerprint from the high-speed storage medium. The storage medium is migrated to the low-speed storage medium;
  • the storage array modifies the address corresponding to the fingerprint of the migrated data block in the fingerprint table to the address of the data block in the low-speed storage medium.
  • the data blocks are divided into cold data and hot data.
  • the cold data is a data block whose heat count result is less than a preset value
  • the hot data is a data block whose heat count result is greater than or equal to the preset value.
  • For cold data it can be stored in low-speed storage media to reduce the storage pressure of high-speed storage media.
  • data is stored at the granularity of files or objects, and data statistics are also calculated at the granularity of files or objects.
  • files or objects are relatively large, so they are stored in storage.
  • media you need to divide the file or object into multiple data blocks for deduplication.
  • the deduplicated data may point to multiple fingerprints.
  • the popularity of a file or object changes, for example, from cold data to hot data , The file or object needs to be migrated to a high-speed storage medium, but the fingerprint of the file or object may also execute other cold data.
  • FIG. 10 is a schematic diagram of a data migration process provided by an embodiment of this application. As shown in FIG.
  • file 1 and file 2 there are two data to be written, namely file 1 and file 2.
  • file 1 includes data block AXXX
  • file 2 also includes data block AXXX.
  • file 1 and file 2 it is determined according to the write configuration strategy that the storage media corresponding to both are high-speed storage media.
  • the fingerprint of the data block AXXX is matched with the fingerprint table. If there is no existing fingerprint matching the fingerprint of AXXX in the fingerprint table, the fingerprint of AXXX will be used. Write to the fingerprint table and save AXXX to the high-speed storage medium at the same time.
  • the existing fingerprint 1 in the fingerprint table matches the fingerprint of AXXX successfully, it means that AXXX has been stored in the high-speed storage medium, delete AXXX in file 1, and the reference count of fingerprint 1 is incremented by 1.
  • file 2 is deduplicated through the fingerprint table in the high-speed storage medium, because AXXX has been stored in the high-speed storage medium, AXXX in file 2 is deleted, and the reference count of fingerprint 1 corresponding to AXXX is increased by 1.
  • file 1 needs to be restored first, that is, data 1 used for storage of file 1 after deduplication is restored to file 1. If data 1 is compressed data, it needs to be decompressed. 1 includes multiple data blocks. Then copy the restored file 1 to the low-speed storage medium.
  • the file 1 is stored in the low-speed storage medium, you still need to deduplicate the file 1 through the fingerprint table stored in the low-speed storage medium, and determine the fingerprint table in the low-speed storage medium If the fingerprint corresponding to AXXX is not included in the AXXX, you need to store AXXX in a low-speed storage medium.
  • the data AXXX originally included in the two files only needs to be stored in a high-speed storage medium before being migrated.
  • it needs to be stored in both a high-speed storage medium and a low-speed storage medium at the same time, resulting in space amplification.
  • file 1 is migrated to a low-speed storage medium, it needs to be deduplicated once, which affects the migration efficiency.
  • the fingerprints in the fingerprint table are counted by heat, that is, the heat of the fingerprints corresponding to the data block is calculated according to the number of readings of the data block corresponding to the fingerprint in the fingerprint table. count.
  • the heat count of the fingerprint within the preset time is greater than the first threshold, it is determined that the data block corresponding to the fingerprint is hot data.
  • the heat count of the fingerprint within the preset time is less than the second threshold, the data is determined to be cold data.
  • the first threshold is greater than the second threshold, it indicates that the data block can be hot data or cold data, or in an intermediate state.
  • the current storage location of the data block is kept without migration.
  • the data block is judged as cold data or hot data based on the heat count result of the fingerprint in the fingerprint table. Because each fingerprint in the same fingerprint table is unique, the heat count result for the fingerprint is also unique.
  • the data block corresponding to the fingerprint will only have one state, which is hot data, cold data or intermediate state data. Therefore, the data block will only be stored in one storage medium, avoiding the same in high-speed storage medium and low-speed storage medium. Space enlargement caused by repeated storage of data blocks.
  • high-speed storage media and low-speed storage media share the same global fingerprint table, there is no need to perform multiple data deduplication during the data migration process. After the data block is migrated, the physical address corresponding to the fingerprint is modified instead of the fingerprint. Tables and reference counts effectively improve migration efficiency.
  • the data block stored in the physical storage medium is a compressed data block.
  • the compressed data block needs to be decompressed and restored to the original file and then the original file is migrated. After the migration is completed Then compress and store the original file. This process requires repeated decompression and compression, resulting in low migration efficiency.
  • the data block corresponding to the fingerprint is determined to be hot data or cold data according to the heat count of the fingerprint in the global fingerprint table, and then the compressed data block stored in the physical storage medium is located according to the fingerprint, and the compressed data is directly determined Blocks can be migrated, and there is no need to repeatedly decompress and recompress data blocks, which further improves the migration efficiency.
  • FIG. 11A is a schematic structural diagram of a data storage device according to an embodiment of the application.
  • the storage device includes a high-speed storage medium and a low-speed storage medium.
  • the device 600 specifically includes:
  • the receiving unit 601 is configured to receive a data write request, where the data write request carries data to be written, and the data to be written includes at least one data block;
  • the calculation unit 602 is configured to calculate a fingerprint of each data block, where the fingerprint is used to uniquely identify each data block;
  • the determining unit 603 is configured to determine whether the fingerprint of each data block exists in a fingerprint table, and the fingerprint table includes the fingerprint corresponding to the data block stored in the high-speed storage medium and the fingerprint stored in the low-speed storage medium. The fingerprint corresponding to the data block;
  • the deduplication unit 604 is configured to perform a deduplication operation on the data to be written, and the deduplication operation is: point the data block of the fingerprint in the fingerprint table to the storage of the fingerprint corresponding data block in the fingerprint table Address, and delete data blocks whose fingerprints exist in the fingerprint table from the data to be written;
  • the storage unit 605 is configured to store the data to be written after the deduplication operation in the high-speed storage medium or the low-speed storage medium.
  • the device provided by the embodiment of the present application generates a global fingerprint table, so that data blocks stored in a high-speed storage medium and a low-speed storage medium share the same fingerprint table for deduplication, which prevents the same data blocks from being stored in different locations.
  • the fingerprints are repeatedly generated in the storage medium, and repeated storage is performed in different storage media, which improves storage efficiency and saves storage space.
  • the determining unit 603 is further configured to:
  • the storage address of the data block whose fingerprint is not in the fingerprint table in the data to be written is added to the fingerprint table.
  • the device 600 further includes a migration unit 606, configured to:
  • the storage address of the data block corresponding to the fingerprint is the low-speed storage medium, the data block corresponding to the fingerprint is migrated from the low-speed storage medium to the high-speed storage medium ;
  • the migration unit 606 is further configured to:
  • the storage address of the data block corresponding to the fingerprint is the high-speed storage medium, the data block corresponding to the fingerprint is migrated from the high-speed storage medium to the low-speed storage medium ;
  • the fingerprint table is stored in the high-speed storage medium.
  • the device 600 is presented in the form of a unit.
  • the "unit” here can refer to an application-specific integrated circuit (ASIC), a processor and memory that executes one or more software or firmware programs, an integrated logic circuit, and/or other devices that can provide the above-mentioned functions .
  • ASIC application-specific integrated circuit
  • the above receiving unit 601, calculation unit 602, determination unit 603, deduplication unit 604, storage unit 605, and migration unit 606 may be implemented by the processor 701 of the apparatus 700 shown in FIG. 12.
  • the apparatus 700 may be implemented with the structure in FIG. 12.
  • the apparatus 700 includes at least one processor 701, at least one memory 702, and at least one communication interface 703.
  • the processor 701, the memory 702, and the communication interface 703 are connected through the communication bus and complete mutual communication.
  • the processor 701 may be a general-purpose central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of the programs in the above scheme.
  • CPU central processing unit
  • ASIC application-specific integrated circuit
  • the communication interface 703 can be used to communicate with other devices or communication networks, such as Ethernet, wireless access network (RAN), wireless local area network (Wireless Local Area Networks, WLAN), and so on.
  • devices or communication networks such as Ethernet, wireless access network (RAN), wireless local area network (Wireless Local Area Networks, WLAN), and so on.
  • the memory 702 can be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), or other types that can store information and instructions
  • the dynamic storage device can also be electrically erasable programmable read-only memory (Electrically Erasable Programmable Read-Only Memory, EEPROM), CD-ROM (Compact Disc Read-Only Memory, CD-ROM) or other optical disc storage, optical disc storage (Including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program codes in the form of instructions or data structures and can be used by a computer Any other media accessed, but not limited to this.
  • the memory can exist independently and is connected to the processor through a bus.
  • the memory can also be integrated with the processor.
  • the memory 702 is used to store application program codes for executing the above solutions, and the processor 701 controls the execution.
  • the processor 701 is configured to execute application program codes stored in the memory 702.
  • the code stored in the memory 702 can execute the data storage device provided above to perform the above data storage method, such as: receiving a data writing request, determining the target storage medium corresponding to the written data according to the data writing request and the data writing configuration strategy; Incoming data performs the fingerprint acquisition operation according to the data block to determine the fingerprint corresponding to the data block.
  • the fingerprint is used to uniquely identify the data block; the fingerprint is matched with the existing fingerprint in the global fingerprint table, and the global fingerprint table is stored in the high-speed storage medium; According to the matching result, the data block is deduplicated to obtain the deduplication data block; the deduplication data block is stored in the target storage medium.
  • An embodiment of the present application further provides a computer storage medium, wherein the computer storage medium may store a program, and the program includes part or all of the steps of any method for adjusting the data transmission rate recorded in the above method embodiment when the program is executed.
  • the disclosed device may be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable memory.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory.
  • a number of instructions are included to enable a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned memory includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.
  • the program can be stored in a computer-readable memory, and the memory can include: a flash disk , Read-only memory (English: Read-Only Memory, abbreviation: ROM), random access device (English: Random Access Memory, abbreviation: RAM), magnetic disk or optical disk, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请公开了一种数据存储方法及装置,方法包括:接收数据写入请求,数据写入请求中携带待写数据,待写数据包括至少一个数据块;计算每个数据块的指纹,指纹用于唯一标识每个数据块;确定每个数据块的指纹是否存在指纹表中,指纹表中包括存储在高速存储介质中的数据块对应指纹及存储在低速存储介质中的数据块对应的指纹;对待写数据执行重删操作:将指纹存在于指纹表中的数据块指向指纹表中的指纹对应数据块的存储地址,并从待写数据中删除其指纹存在于指纹表中的数据块;将经过重删操作的待写数据存储至高速存储介质或者低速存储介质。采用本申请实施例的方案通过使用全局指纹表对存储数据进行重删后存储,能够有效提升存储效率,节省存储空间。

Description

一种数据存储方法及装置 技术领域
本申请涉及存储领域,尤其涉及一种数据存储方法及装置。
背景技术
在存储技术领域,面临的挑战来自于用户不断增长的容量和性能要求。随着数据量的不断增长,需要不断的购买存储设备,增加存储成本。对于数据生命周期的各个环节,数据的性能也有不同的诉求,需要不断的对数据进行调整。比如在生产阶段需要部署在高性能存储设备上,在归档阶段需要搬移数据到低性能廉价存储设备上。如何简化数据管理、提升存储性能、降低设备成本是当前存储技术的难题。
为了提升储性能、降低成本、简化管理,目前比较普遍的方式是采用分级存储技术。采用高性能的固态驱动器(Solid State Disk或Solid State Drive,SSD)和低性能大容量的机械盘组成混合存储池的技术,通过数据的冷热识别技术和迁移技术把热数据和冷数据分别存放在SSD和机械盘中,这是一种平衡成本和性能解决方案。但是,运用分级存储技术时,会在不同的存储介质中分别执行数据重删操作,重删效果差,导致可能在不同的存储介质中同时存储相同的数据块,造成存储空间的浪费。
发明内容
本申请实施例提供一种数据存储方法及装置,采用本申请实施例的方案能够通过在存储阵列中,生成全局指纹表用于对数据进行重删,提升数据重删效果和存储效率。
第一方面,本申请实施例提供一种数据存储方法,运用于存储阵列,所述存储阵列包括高速存储介质和低速存储介质,所述方法包括:
接收数据写入请求,所述数据写入请求中携带待写数据,所述待写数据包括至少一个数据块;
计算每个数据块的指纹,所述指纹用于唯一标识每个数据块;
确定所述每个数据块的指纹是否存在指纹表中,所述指纹表中包括存储在所述高速存储介质中的数据块对应指纹及存储在所述低速存储介质中的数据块对应的指纹;
对所述待写数据执行重删操作,所述重删操作为:将指纹存在于所述指纹表中的数据块指向所述指纹表中的指纹对应数据块的存储地址,并从所述待写数据中删除其指纹存在于所述指纹表中的数据块;
将经过重删操作的所述待写数据存储至所述高速存储介质或者低速存储介质。
在本申请实施例中,通过生成全局指纹表,使得存储在高速存储介质和低速存储介质中的数据块共用同一个指纹表进行重复数据删除操作,避免了同样的数据块在不同的存储介质中重复生成指纹,且在不同的存储介质中进行重复存储,提升了存储效率,节省了存储空间。
在一种可行的实施例中,所述方法还包括:
将所述待写数据中不存在于所述指纹表中的数据块的指纹添加至所述指纹表;
当经过重删操作的所述待写数据存储至所述高速存储介质或者低速存储介质后,将所述待写数据中其指纹不在所述指纹表中的数据块的存储地址添加至所述指纹表。
在一种可行的实施例中,所述方法还包括:
对于指纹存在于所述指纹表中的数据块,根据对所述数据块的读取次数递增所述数据块的指纹的热度计数;
对于热度计数大于第一阈值的指纹,若所述指纹对应的数据块的存储地址为所述低速存储介质,则将所述指纹对应的数据块从所述低速存储介质迁移至所述高速存储介质;
将所述指纹表中被迁移的数据块的指纹对应的地址修改为所述数据块在所述高速存储介质中的地址。
在一种可行的实施例中,所述方法还包括:
对于热度计数小于第二阈值的指纹,若所述指纹对应的数据块的存储地址为所述高速存储介质,则将所述指纹对应的数据块从所述高速存储介质迁移至所述低速存储介质;
将所述指纹表中被迁移的数据块的指纹对应的地址修改为所述数据块在所述低速存储介质中的地址。
在一种可行的实施例中,所述指纹表存储在所述高速存储介质中。
第二方面,本申请实施例提供了一种装置,
所述存储装置包括高速存储介质和低速存储介质,具体包括:
接收单元,用于接收数据写入请求,所述数据写入请求中携带待写数据,所述待写数据包括至少一个数据块;
计算单元,用于计算每个数据块的指纹,所述指纹用于唯一标识每个数据块;
确定单元,用于确定所述每个数据块的指纹是否存在指纹表中,所述指纹表中包括存储在所述高速存储介质中的数据块对应指纹及存储在所述低速存储介质中的数据块对应的指纹;
重删单元,用于对所述待写数据执行重删操作,所述重删操作为:将指纹存在于所述指纹表中的数据块指向所述指纹表中的指纹对应数据块的存储地址,并从所述待写数据中删除其指纹存在于所述指纹表中的数据块;
存储单元,用于将经过重删操作的所述待写数据存储至所述高速存储介质或者低速存储介质。
在一种可行的实施例中,所述确定单元还用于:
将所述待写数据中不存在于所述指纹表中的数据块的指纹添加至所述指纹表;
当经过重删操作的所述待写数据存储至所述高速存储介质或者低速存储介质后,将所述待写数据中其指纹不在所述指纹表中的数据块的存储地址添加至所述指纹表。
在一种可行的实施例中,所述装置还包括迁移单元,用于:
对于指纹存在于所述指纹表中的数据块,根据对所述数据块的读取次数递增所述数据块的指纹的热度计数;
对于热度计数大于第一阈值的指纹,若所述指纹对应的数据块的存储地址为所述低速存储介质,则将所述指纹对应的数据块从所述低速存储介质迁移至所述高速存储介质;
将所述指纹表中被迁移的数据块的指纹对应的地址修改为所述数据块在所述高速存储介质中的地址。
在一种可行的实施例中,所述迁移单元还用于:
对于热度计数小于第二阈值的指纹,若所述指纹对应的数据块的存储地址为所述高速存储介质,则将所述指纹对应的数据块从所述高速存储介质迁移至所述低速存储介质;
将所述指纹表中被迁移的数据块的指纹对应的地址修改为所述数据块在所述低速存 储介质中的地址。
在一种可行的实施例中,所述指纹表存储在所述高速存储介质中。
第三方面,本申请实施例提供了一种装置,包括:
存储有可执行程序代码的存储器;
与所述存储器耦合的处理器;
所述处理器调用所述存储器中存储的所述可执行程序代码,使得所述装置执行如第一方面所述的任一方法。
第四方面,本申请实施例提供了一种计算机可读存储介质,所述计算机存储介质包括程序指令,所述程序指令在计算机上运行时,使所述计算机执行如第一方面所述的任一方法。
本申请的这些方面或其他方面在以下实施例的描述中会更加简明易懂。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的一种IT系统示意图;
图2为本申请实施例提供的一种存储阵列的结构示意图;
图3A为本申请实施例提供的一种基于全闪存架构的前台重删方案示意图;
图3B是本申请实施例提供的一种基于机械盘架构的后台重删方案示意图;
图4为本申请实施例提供的一种分级存储方案示意图;
图5为本申请实施例提供的一种数据存储方法流程示意图;
图6为本申请实施例提供的一种数据存储过程示意图;
图7为本申请实施例提供的一种指纹表匹配过程示意图;
图8为本申请实施例提供的一种跨系统存储架构示意图;
图9为本申请实施例提供的一种数据迁移方法过程示意图;
图10为本申请实施例提供的一种数据迁移过程示意图;
图11A为本申请实施例提供的一种数据存储装置的结构示意图;
图11B为本申请实施例提供的另一种数据存储装置的结构示意图;
图12为本申请实施例提供的一种装置的结构示意图。
具体实施方式
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案说明。
本申请实施例的应用场景为互联网技术(Internet Technology,IT)系统,如图1所示,图1为本申请实施例提供的一种IT系统示意图,该系统由用户终端10、应用服务器20、存储阵列40、管理员30以及网络50组成。应用服务器20可以是虚拟服务器或物理服务器,负责应用数据的计算,并把数据保存到存储阵列中。存储阵列40基于固态硬盘、传统机械盘或者是磁带库等介质组成,用于持久化保存应用数据。存储阵列和应用服务 器之间的访问协议可以是通用网络文件系统(Common Internet File System,CIFS)、网络文件系统(Network File System,NFS)、小型计算机系统接口(Internet Small Computer System Interface,iSCSI)、光纤通道(Fiber Channel,FC)等协议。
图2为本申请实施例提供的一种存储阵列40的结构示意图,如图2所示,存储阵列40中包括输入设备401,输出设备402和中央处理器403(central processing unit,CPU),CPU中包括运算器和控制器,且与内存404和外存405连接,外存405包括高速存储介质和低速存储介质。其中高速存储介质以SSD为代表,低速存储介质以硬盘驱动器(Hard Disk Drive,HDD)为代表。
存储架构包括全闪存架构、机械盘架构和分级存储架构,在全闪存架构中,只允许接入SSD,在机械盘架构中,只包括硬盘驱动器(Hard Disk Drive,HDD)。在分级存储架构中,通过在逻辑存储层对需要存储的数据进行存储路径规划,即可将数据根据不同的存储策略存储到SSD或者HDD中。本申请实施例中的存储阵列40正是一种分级存储架构。
在采用分级存储技术的前提下,为了更进一步提升存储阵列40利用率,可以对待写入数据进行重删操作。重删操作的过程为:首先,在存储阵列接收到待写入数据时,将待写入数据切割为等长的数据块(为方便描述,以下将待写入数据所包括的数据块称为“待写数据块“),数据块的长度例如可以为4KB、8KB或16KB等,然后计算每个待写数据块的特征值,即指纹,不同数据块的指纹集合组成指纹表,指纹表中还包括每个指纹对应的引用计数和所述指纹所表示的数据块的存储地址。所述引用计数用于表示指纹被不同待写入数据使用的次数。在计算出待写数据块的指纹后,将每个待写数据块的指纹与指纹表中的指纹进行对比,如果待写入数据块的指纹与指纹表中已存储的指纹匹配,说明存储器中已经存储了所述待写数据块,则从待写入数据中删除所述待写数据块,并将所删除的数据块在所述待写入数据中的位置指向所述指纹表中与所删除的数据块的指纹匹配的指纹,然后增加一次与所删除的数据块的指纹匹配的指纹的引用计数,如果所述指纹表中不存在所述待写数据块的指纹,则将所述待写数据块的指纹添加至所述指纹表,并在所述待写数据块存储至存储器后,将所述待写数据块在所述存储器的存储地址记录在所述指纹表中。
重删可以选择两种处理方式:一种基于全闪存架构的前台重删方案;另外一种是基于机械盘架构的后台重删方案。请参阅图3A,图3A为本申请实施例提供的一种基于全闪存架构的前台重删方案示意图,由于全闪存架构中,SSD的读写速度比较块,所以一般采用前台重删的方案。如图3A所示,在待写数据写入高速存储介质时根据内存中的指纹表对待写数据进行重删,并将完成重删的数据存储到所述高速存储介质中。这个过程因为是在数据块存入物理介质之前进行的重复数据删除,因此为前台重删。
请参阅图3B,图3B是本申请实施例提供的一种基于机械盘架构的后台重删方案示意图。由于在机械盘架构中,HDD的读写速度比较慢,所以采用后台重删,以减少重删对IO的影响。如图3B所示,先将待写数据写入低速存储介质,然后对已写入数据根据指纹表进行重删,删除已写入数据中与指标表中的指纹对应的数据块相同的数据块。这个过程因为是在数据存入物理介质之后进行的重复数据删除,因此为后台重删。
本申请实施例中的存储阵列中既包括SSD,又包括HDD,两种重删技术都能选择。但是,后台重删是后台进行数据删除的,所以存在数据删除的迟滞性,且先将未删除数据存入物理介质中,那么物理介质必然要预留额外的存储空间,导致空间放大问题。其 次,在数据写入物理介质之后进行数据重删,必然带来额外的硬盘读写操作,最终导致硬盘寿命降低。因此,考虑到SSD的性能优势和后台重删的劣势,本申请实施例中的分级存储选择前台重删技术。请参阅图4,图4为本申请实施例提供的一种分级存储方案示意图,如图4所示,是一种基于前台重删技术的分级存储方法,其中的存储介质包括高速存储介质和低速存储介质,待写数据根据写入配置策略被存入高速存储介质或者低速存储介质。但是这种分级存储方法仍然存在问题:高速存储介质和低速存储介质分别存储了一个指纹表,这两个指纹表互相之间无法感知。不同的待写文件根据写入配置策略被分别写入高速存储介质和低速存储介质,但是不同的待写数据中可能包括同样的数据块,而同样的数据块被同时存储到高速存储介质和低速存储介质,造成数据块的重复存储。这样无疑会降低存储效率,增加存储空间消耗。
基于上述问题,请参阅图5,图5为本申请实施例提供的一种数据存储方法流程示意图,运用于图2所示的存储阵列,所述存储阵列包括高速存储介质和低速存储介质,如图5所示,该方法包括如下步骤:
501、存储阵列接收数据写入请求,所述数据写入请求中携带待写数据,所述待写数据包括至少一个数据块。
存储阵列接收应用服务器提交的数据写入请求,然后解析获得写入请求中携带的待写数据,其中待写数据中包括一个或多个数据块。存储阵列接收到不同的待写数据会被写入逻辑单元号(Logical Unit Number,LUN)中的多个逻辑区块地址(Logical Block Address,LBA)中,或者被写入网络附属存储(Network Attached Storage,NAS)生成的文件系统中的多级文件目录指示的地址中,存储阵列根据这些逻辑存储空间对应的写入配置策略确定其中存储的数据被存储到高速存储介质或低速存储介质。其中,写入配置策略可以根据LBA的写入频度确定,LBA写入频度越高,说明该LBA的修改次数越多,其中存储的数据为常修改数据,在写入频度大于预设频度时,为了保证高频度写入性能要求,确定LBA中的数据存入高速存储介质。或者,写入配置策略可以根据文件系统下的文件属性确定,例如文件属性为“归档文件”,可以将其存入低速存储介质。
502、存储阵列计算每个数据块的指纹,所述指纹用于唯一标识每个数据块。
根据前述描述可知,每个数据块的指纹用来唯一标识数据块,因此数据块的指纹可以是数据块经过哈希值运算获得的哈希值,或者是数据块中的数据进行单词计数、概率分布统计或者数值平均值计算等方法计算获得指纹。
503、存储阵列确定所述每个数据块的指纹是否存在指纹表中,所述指纹表中包括存储在所述高速存储介质中的数据块对应指纹及存储在所述低速存储介质中的数据块对应的指纹。
在将待写数据中的数据块存储到物理存储介质之前,需要对其进行重删操作,以便减少同样的数据块重复存储带来的存储空间消耗。如前所述,重删时将待写数据中的数据块的指纹与指纹表中的已有指纹进行比对,确定数据块的指纹是否存在指纹表中。
与图4中高速存储介质和低速存储介质分别生成各自的指纹表不同,请参阅图6,图6为本申请实施例提供的一种数据存储过程示意图,如图6所示,本申请实施例生成的指纹表为全局指纹表,在该全局指纹表中,包括所有存储到高速存储介质和低速存储介质中的数据块对应的指纹。待写数据中的数据块进行重删时,都与该全局指纹表进行指纹匹配,以确定待写数据中的数据块是否已经存储到高速存储介质或低速存储介质中。
可选的,该全局指纹表存储在高速存储介质中,这样在处理器读取指纹表至内存时, 可以保证高效率和低时延,提升数据存储效率。
504、对所述待写数据执行重删操作,所述重删操作为:将指纹存在于所述指纹表中的数据块指向所述指纹表中的指纹对应数据块的存储地址,并从所述待写数据中删除其指纹存在于所述指纹表中的数据块。
具体地,请参阅图7,图7为本申请实施例提供的一种指纹表匹配过程示意图,如图7所示,全局指纹表是一个包含存储在物理存储介质中的数据块对应的多个已有指纹的数据列表,待写数据中的数据块的指纹与列表中的已有指纹进行依次匹配,匹配结果包括两个,指纹表中的已有指纹与数据块的指纹完全相同,或者指纹表中的已有指纹与数据块的指纹不同。
在将数据块的指纹与全局指纹表中的已有指纹进行匹配时,如果全局指纹表中存在于数据块的指纹相同的已有指纹,说明指纹对应的数据块已经在高速存储介质或低速存储介质中存储过了,不需要对数据块进行再次存储,从待写数据中删除该数据块。反之,如果全局指纹表中不存在与数据块的指纹相同的已有指纹,说明指纹对应的数据块还未在高速存储介质或低速存储介质中存储过,需要将该数据块进行保留,以便后续存储到物理存储介质中。
另外,全局指纹表中除了包括物理存储介质中存储的数据块对应的指纹外,还包括指纹对应的数据块的存储地址,存储地址为指纹对应的数据块在高速存储介质或者低速存储介质中的存储地址,在本申请实施例中可以为SSD或HDD。全局指纹表中还包括指纹对应的引用计数,引用计数用来记录指纹被用于进行数据块重删的次数,在所述指纹被第一次记录至所述指纹表中时,则所述引用计数为1,后续,所述指纹每被用于一次重删操作,则所述引用次数递增一次。例如,指纹a对应数据块A,当将数据块存储至存储介质时,将指纹a添加至全局指纹表,并将指纹a对应的应用计数记录为1,当后续接收到待写数据ABC时,由于数据块A的指纹与指纹表中的指纹a相同,则用所述指纹a对待写数据进行重删操作,即将所述待写数据中的数据块A删除,并将所述待写数据中的数据块A用指针指向指纹表中指纹a对应的物理地址,例如①BC,其中①为指纹a对应的指针,同时将所述指纹a的引用计数递增为2。同理,当后续接收到待写数据AEF时,则利用所述指纹a对所述待写数据AEF进行重删后为①EF,并将所述指纹a的引用计数递增为3。
505、将经过重删操作的所述待写数据存储至所述高速存储介质或者低速存储介质。
对待写数据进行重删后,经过重删操作的待写数据中包括多个去重数据块。根据前述写入配置策略确定将去重数据块写入高速存储介质或低速存储介质,完成数据块的存储过程。
或者,在将待写数据存储到物理介质之前,还可以采用数据压缩技术对待写数据中的数据块进行压缩,再将压缩后的数据块存储到物理介质中。数据压缩是通过LZ4等压缩算法把原始数据中的冗余信息剔除,以节省存储空间。数据重删和数据压缩可以结合使用,先重删后压缩,可以最大化缩减数据占用空间。
可见,在本申请实施例中,通过生成全局指纹表,使得存储在高速存储介质和低速存储介质中的数据块共用同一个指纹表进行重复数据删除操作,避免了同样的数据块在不同的存储介质中重复生成指纹,且在不同的存储介质中进行重复存储,提升了存储效率,节省了存储空间。
另外,在一种可能的情况下,高速存储介质和低速存储介质可能存在于不同的存储 系统或存储阵列中,但两个存储系统和存储阵列相互连接。请参阅图8,图8为本申请实施例提供的一种跨系统存储架构示意图,如图8所示,高速存储介质和低速存储介质为两个不同的独立系统,同样的,为了减少在不同存储系统或存储阵列中分别生成指纹表,造成同样的数据块在两个指纹表中都生成指纹而造成的重删效率低下和重复存储问题,可以在其中一个存储系统或存储阵列中生成一个全局指纹表,而另一个存储系统或存储阵列中的待写数据全部发送到包括全局指纹表的存储系统或存储阵列中进行重复数据删除,获得去重数据块,最后将去重数据块发回存储到其对应的存储系统或存储阵列中即可。
其中,为了提升重删效率,可以在高速存储介质中生成全局指纹。全局指纹表用于对高速存储介质中的待写入数据进行重删存储的同时,低速存储介质中的待写数据发送到高速存储介质中进行重删,然后将低速存储介质中的待写数据对应的重删后的数据块发送回低速存储介质进行存储即可。该实施例能够有效提升跨系统的存储介质的重删和存储效率。
请参阅图9,图9为本申请实施例提供的一种数据迁移方法过程示意图,如图9所示,在上述通过全局指纹表进行数据重复删除和存储后,还可以进行数据迁移,该方法包括如下步骤:
511、对于指纹存在于所述指纹表中的数据块,存储阵列根据对所述数据块的读取次数递增所述数据块的指纹的热度计数;
512、对于在预设时间内热度计数大于第一阈值的指纹,若所述指纹对应的数据块的存储地址为所述低速存储介质,存储阵列则将所述指纹对应的数据块从所述低速存储介质迁移至所述高速存储介质;
513、存储阵列将所述指纹表中被迁移的数据块的指纹对应的地址修改为所述数据块在所述高速存储介质中的地址;
514、对于在预设时间内热度计数小于第二阈值的指纹,若所述指纹对应的数据块的存储地址为所述高速存储介质,存储阵列则将所述指纹对应的数据块从所述高速存储介质迁移至所述低速存储介质;
515、存储阵列将所述指纹表中被迁移的数据块的指纹对应的地址修改为所述数据块在所述低速存储介质中的地址。
在根据上述步骤501~505将待写数据存储到高速存储介质和低速存储介质后,数据块还会被读取使用,根据数据块的读取次数递增热度计数值,进而根据热度计数结果可以将数据块分为冷数据和热数据,其中冷数据为热度计数结果小于预设值的数据块,热数据为热度计数结果大于或等于预设值的数据块。对应热数据,应该将其存储在高速存储介质中,以满足高速访问的要求,对于冷数据,可以将其存储在低速存储介质中,降低高速存储介质的存储压力。
目前,对于文件存储系统及对象存储系统,以文件或者对象为粒度进行数据的存储,对数据的统计也是以文件或者对象为粒度进行统计的,但是一般文件或者对象比较大,所以在存储至存储介质的时候,需要将文件或者对象分成多个数据块进行重删,这样,重删后的数据可能指向多个指纹,当一个文件或者对象的热度发生变化,例如由冷数据变为热数据时,需要将该文件或者对象迁移至高速存储介质,但是该文件或者对象的指纹有可能还执行其他冷数据,所以,在迁移所述文件或者对象时,需要恢复所述重删后 的文件,如果该文件或者对象在存储前还进行了压缩,还要先进行解压缩。在文件或者对象恢复后,则将恢复后的文件或者对象迁移至高速存储介质,而将所述文件或者对象存储至所述高速存储介质之前又要进行一次重删及压缩,这样,不但需要多次做重删压缩,消耗计算资源,而且在数据迁移后,原来只存储在低速存储介质中的数据,需要同时存储在高速介质及低速介质中,所以造成存储空间的放大请参阅图10,图10为本申请实施例提供的一种数据迁移过程示意图,如图10所示,有2个待写数据,分别为文件1和文件2,假设文件1包括数据块AXXX,文件2也包括数据块AXXX。文件1和文件2在进行存储时,根据写入配置策略确定两者对应的存储介质都为高速存储介质。文件1经过高速存储介质中的指纹表进行重删时,将其中的数据块AXXX的指纹与指纹表进行匹配,如果指纹表中不存在与AXXX的指纹匹配的已有指纹,则将AXXX的指纹写入指纹表,同时将AXXX存入高速存储介质。如果指纹表中的已有指纹1与AXXX的指纹匹配成功,说明高速存储介质中已经存储了AXXX,删除文件1中的AXXX,同时指纹1的引用计数加1。同样的,文件2经过高速存储介质中的指纹表进行重删时,因为AXXX已经存储在高速存储介质中,因此删除文件2中的AXXX,同时AXXX对应的指纹1的引用计数加1。完成文件1和文件2的存储。之后,对文件1经过一段时间的热度统计,确定了文件1中的数据为冷数据,而文件2中的数据为仍然为热数据。那么需要将文件1对应的数据从高速存储介质迁移到低速存储介质。在对文件1进行迁移时,首先需要对文件1进行还原,即将文件1在经过重删后用于存储的数据1还原为文件1,如果数据1为压缩数据,还需要进行解压缩,其中数据1包括多个数据块。然后将恢复后的文件1拷贝至低速存储介质,在文件1存储至低速存储介质时,还是需要通过低速存储介质中存储的指纹表对文件1进行数据重删,在确定低速存储介质中指纹表中不包含AXXX对应的指纹后,则需要将AXXX存储在低速存储介质中,如此,原来包括在两个文件中的数据AXXX在迁移前只需要在高速存储介质中存储一份,而在文件1迁移至低速存储介质后,需要同时在高速存储介质及低速存储介质中存储,从而产生空间放大。另外,在文件1迁移至低速存储介质后,还需要进行一次重删,影响迁移效率。
在本申请实施例中,在使用全局指纹表的前提下,通过对指纹表中的指纹进行热度计数,即根据指纹表中的指纹对应的数据块的读取次数对数据块对应指纹的进行热度计数。当指纹在预设时间内的热度计数大于第一阈值时,确定指纹对应的数据块为热数据,当指纹在预设时间内的热度计数小于第二阈值时,确定数据为冷数据,第一阈值≥第二阈值,其中当第一阈值等于第二阈值时,说明数据块只有热数据或冷数据两种状态。当第一阈值大于第二阈值时,说明数据块可以为热数据或冷数据,也可以为中间状态,当数据块为中间状态时,保持数据块的当前存储位置不进行迁移。
在这个过程中,通过对指纹表中指纹的热度计数结果判定数据块为冷数据或热数据,因为同一个指纹表中每一个指纹都是唯一的,那么对于指纹的热度计数结果也是唯一的,指纹对应的数据块也只会有一个状态,为热数据、冷数据或中间状态数据,因此数据块也只会存储在一个存储介质中,避免了在高速存储介质和低速存储介质中对同一个数据块的重复存储造成的空间放大。同样的,因为高速存储介质和低速存储介质共享同一个全局指纹表,数据迁移过程中也不需要进行多次数据重删,在数据块进行迁移后,修改指纹对应的物理地址,而不用修改指纹表和引用计数,有效提升了迁移效率。
另外,如果存储到物理存储介质中的数据块是进行压缩后的数据块。在传统的数据迁移过程中,以文件或者对象为单位进行热度统计后,如果确定需要对文件进行迁移, 那么需要对压缩数据块进行解压缩还原为原始文件再将原始文件进行迁移,迁移完成后再对原始文件进行压缩存储。这个过程需要进行反复解压和压缩,造成迁移效率低下。
在本申请实施例中,根据全局指纹表中指纹的热度计数确定指纹对应的数据块为热数据或者冷数据,然后根据指纹定位到存储在物理存储介质中的压缩数据块,并直接对压缩数据块进行迁移即可,不需要进行数据块的反复解压和再压缩,进一步提升了迁移效率。
可见,在本申请实施例中,通过对全局指纹表中的指纹进行热度计数,并根据热度计数结果确定是否对指纹对应的数据块进行迁移,能够有效避免对同一数据块的重复存储造成的空间放大问题。同时能有效避免重复存储时多余的重删操作和压缩与解压缩操作,提升数据迁移效率。
参见图11A,图11A为本申请实施例提供的一种数据存储装置的结构示意图。所述存储装置包括高速存储介质和低速存储介质,如图11A所示,装置600具体包括:
接收单元601,用于接收数据写入请求,所述数据写入请求中携带待写数据,所述待写数据包括至少一个数据块;
计算单元602,用于计算每个数据块的指纹,所述指纹用于唯一标识每个数据块;
确定单元603,用于确定所述每个数据块的指纹是否存在指纹表中,所述指纹表中包括存储在所述高速存储介质中的数据块对应指纹及存储在所述低速存储介质中的数据块对应的指纹;
重删单元604,用于对所述待写数据执行重删操作,所述重删操作为:将指纹存在于所述指纹表中的数据块指向所述指纹表中的指纹对应数据块的存储地址,并从所述待写数据中删除其指纹存在于所述指纹表中的数据块;
存储单元605,用于将经过重删操作的所述待写数据存储至所述高速存储介质或者低速存储介质。
可见,本申请实施例提供的装置,通过生成全局指纹表,使得存储在高速存储介质和低速存储介质中的数据块共用同一个指纹表进行重复数据删除操作,避免了同样的数据块在不同的存储介质中重复生成指纹,且在不同的存储介质中进行重复存储,提升了存储效率,节省了存储空间。
在一个可选的示例中,所述确定单元603还用于:
将所述待写数据中不存在于所述指纹表中的数据块的指纹添加至所述指纹表;
当经过重删操作的所述待写数据存储至所述高速存储介质或者低速存储介质后,将所述待写数据中其指纹不在所述指纹表中的数据块的存储地址添加至所述指纹表。
在一个可选的示例中,如图11B所示的另一种数据存储装置的结构示意图,所述装置600还包括迁移单元606,用于:
对于指纹存在于所述指纹表中的数据块,根据对所述数据块的读取次数递增所述数据块的指纹的热度计数;
对于热度计数大于第一阈值的指纹,若所述指纹对应的数据块的存储地址为所述低速存储介质,则将所述指纹对应的数据块从所述低速存储介质迁移至所述高速存储介质;
将所述指纹表中被迁移的数据块的指纹对应的地址修改为所述数据块在所述高速存储介质中的地址。
在一个可选的示例中,所述迁移单元606还用于:
对于热度计数小于第二阈值的指纹,若所述指纹对应的数据块的存储地址为所述高速存储介质,则将所述指纹对应的数据块从所述高速存储介质迁移至所述低速存储介质;
将所述指纹表中被迁移的数据块的指纹对应的地址修改为所述数据块在所述低速存储介质中的地址。
在一个可选的示例中,所述指纹表存储在所述高速存储介质中。
需要说明的是,上述各单元(接收单元601、计算单元602、确定单元603、重删单元604、存储单元605,以及迁移单元606)用于执行上述方法的相关步骤。
在本实施例中,装置600是以单元的形式来呈现。这里的“单元”可以指特定应用集成电路(application-specific integrated circuit,ASIC),执行一个或多个软件或固件程序的处理器和存储器,集成逻辑电路,和/或其他可以提供上述功能的器件。此外,以上接收单元601、计算单元602、确定单元603、重删单元604、存储单元605,以及迁移单元606可通过图12所示的装置700的处理器701来实现。
如图12所示,装置700可以以图12中的结构来实现,该装置700包括至少一个处理器701,至少一个存储器702以及至少一个通信接口703。所述处理器701、所述存储器702和所述通信接口703通过所述通信总线连接并完成相互间的通信。
处理器701可以是通用中央处理器(CPU),微处理器,特定应用集成电路(application-specific integrated circuit,ASIC),或一个或多个用于控制以上方案程序执行的集成电路。
通信接口703,可以用于与其他设备或通信网络通信,如以太网,无线接入网(RAN),无线局域网(Wireless Local Area Networks,WLAN)等。
存储器702可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)、只读光盘(Compact Disc Read-Only Memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器可以是独立存在,通过总线与处理器相连接。存储器也可以和处理器集成在一起。
其中,所述存储器702用于存储执行以上方案的应用程序代码,并由处理器701来控制执行。所述处理器701用于执行所述存储器702中存储的应用程序代码。
存储器702存储的代码可执行以上提供的数据存储装置执行上述数据存储方法,比如:接收数据写入请求,根据数据写入请求和数据写入配置策略确定写入数据对应的目标存储介质;对写入数据按照数据块进行指纹获取操作,确定数据块对应的指纹,指纹用于唯一标识数据块;将指纹与全局指纹表中的已有指纹进行匹配,全局指纹表存储在高速存储介质中;根据匹配结果对数据块进行重复数据删除,获得去重数据块;将去重数据块存储到目标存储介质中。
本申请实施例还提供一种计算机存储介质,其中,该计算机存储介质可存储有程序,该程序执行时包括上述方法实施例中记载的任何一种数据传输速率的调整方法的部分或全部步骤。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储器中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储器包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储器中,存储器可以包括:闪存盘、只读存储器(英文:Read-Only Memory,简称:ROM)、随机存取器(英文:Random Access Memory,简称:RAM)、磁盘或光盘等。
以上对本申请实施例进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上上述,本说明书内容不应理解为对本申请的限制。

Claims (11)

  1. 一种数据存储方法,其特征在于,运用于存储阵列,所述存储阵列包括高速存储介质和低速存储介质,所述方法包括:
    接收数据写入请求,所述数据写入请求中携带待写数据,所述待写数据包括至少一个数据块;
    计算每个数据块的指纹,所述指纹用于唯一标识每个数据块;
    确定所述每个数据块的指纹是否存在指纹表中,所述指纹表中包括存储在所述高速存储介质中的数据块对应指纹及存储在所述低速存储介质中的数据块对应的指纹;
    对所述待写数据执行重删操作,所述重删操作为:将指纹存在于所述指纹表中的数据块指向所述指纹表中的指纹,并从所述待写数据中删除其指纹存在于所述指纹表中的数据块;
    将经过重删操作的所述待写数据存储至所述高速存储介质或者低速存储介质。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    将所述待写数据中不存在于所述指纹表中的数据块的指纹添加至所述指纹表;
    当经过重删操作的所述待写数据存储至所述高速存储介质或者低速存储介质后,将指纹不在所述指纹表中的数据块的存储地址添加至所述指纹表。
  3. 根据权利要求1或2所述的方法,其特征在于,所述方法还包括:
    对于指纹存在于所述指纹表中的数据块,根据对所述数据块的读取次数递增所述数据块的指纹的热度计数;
    对于在预设时间内热度计数大于第一阈值的指纹,若所述指纹对应的数据块的存储地址为所述低速存储介质,则将所述指纹对应的数据块从所述低速存储介质迁移至所述高速存储介质;
    将所述指纹表中被迁移的数据块的指纹对应的地址修改为所述数据块在所述高速存储介质中的地址。
  4. 根据权利要求3所述的方法,其特征在于,所述方法还包括:
    对于在预设时间内热度计数小于第二阈值的指纹,若所述指纹对应的数据块的存储地址为所述高速存储介质,则将所述指纹对应的数据块从所述高速存储介质迁移至所述低速存储介质;
    将所述指纹表中被迁移的数据块的指纹对应的地址修改为所述数据块在所述低速存储介质中的地址。
  5. 根据权利要求1所述的方法,其特征在于,所述指纹表存储在所述高速存储介质中。
  6. 一种数据存储装置,其特征在于,所述存储装置包括高速存储介质和低速存储介质,具体包括:
    接收单元,用于接收数据写入请求,所述数据写入请求中携带待写数据,所述待写数据包括至少一个数据块;
    计算单元,用于计算每个数据块的指纹,所述指纹用于唯一标识每个数据块;
    确定单元,用于确定所述每个数据块的指纹是否存在指纹表中,所述指纹表中包括存储在所述高速存储介质中的数据块对应指纹及存储在所述低速存储介质中的数据块对应的指纹;
    重删单元,用于对所述待写数据执行重删操作,所述重删操作为:将指纹存在于所述指纹表中的数据块指向所述指纹表中的指纹对应数据块的存储地址,并从所述待写数据中删除其指纹存在于所述指纹表中的数据块;
    存储单元,用于将经过重删操作的所述待写数据存储至所述高速存储介质或者低速存储介质。
  7. 根据权利要求6所述的装置,其特征在于,所述确定单元还用于:
    将所述待写数据中不存在于所述指纹表中的数据块的指纹添加至所述指纹表;
    当经过重删操作的所述待写数据存储至所述高速存储介质或者低速存储介质后,将指纹不在所述指纹表中的数据块的存储地址添加至所述指纹表。
  8. 根据权利要求6或7所述的装置,其特征在于,所述装置还包括迁移单元,用于:
    对于指纹存在于所述指纹表中的数据块,根据对所述数据块的读取次数递增所述数据块的指纹的热度计数;
    对于热度计数大于第一阈值的指纹,若所述指纹对应的数据块的存储地址为所述低速存储介质,则将所述指纹对应的数据块从所述低速存储介质迁移至所述高速存储介质;
    将所述指纹表中被迁移的数据块的指纹对应的地址修改为所述数据块在所述高速存储介质中的地址。
  9. 根据权利要求8所述的装置,其特征在于,所述迁移单元还用于:
    对于热度计数小于第二阈值的指纹,若所述指纹对应的数据块的存储地址为所述高速存储介质,则将所述指纹对应的数据块从所述高速存储介质迁移至所述低速存储介质;
    将所述指纹表中被迁移的数据块的指纹对应的地址修改为所述数据块在所述低速存储介质中的地址。
  10. 根据权利要求6所述的装置,其特征在于,所述指纹表存储在所述高速存储介质中。
  11. 一种装置,其特征在于,包括:
    存储有可执行程序代码的存储器;
    与所述存储器耦合的处理器;
    所述处理器调用所述存储器中存储的所述可执行程序代码,使得所述装置执行如权利要求1-5任一项所述的方法。
PCT/CN2020/121843 2019-10-17 2020-10-19 一种数据存储方法及装置 Ceased WO2021073635A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP20877442.2A EP4030273B1 (en) 2019-10-17 2020-10-19 Data storage method and device
US17/720,479 US11886729B2 (en) 2019-10-17 2022-04-14 Data storage method and apparatus
US18/534,230 US12159046B2 (en) 2019-10-17 2023-12-08 Data storage method and apparatus
US18/964,996 US20250094070A1 (en) 2019-10-17 2024-12-02 Data Storage Method and Apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910999337.0 2019-10-17
CN201910999337.0A CN112684975B (zh) 2019-10-17 2019-10-17 一种数据存储方法及装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/720,479 Continuation US11886729B2 (en) 2019-10-17 2022-04-14 Data storage method and apparatus

Publications (1)

Publication Number Publication Date
WO2021073635A1 true WO2021073635A1 (zh) 2021-04-22

Family

ID=75445336

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/121843 Ceased WO2021073635A1 (zh) 2019-10-17 2020-10-19 一种数据存储方法及装置

Country Status (4)

Country Link
US (3) US11886729B2 (zh)
EP (1) EP4030273B1 (zh)
CN (2) CN112684975B (zh)
WO (1) WO2021073635A1 (zh)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113076068B (zh) * 2021-04-27 2022-10-21 哈尔滨工业大学(深圳) 一种数据存储方法、装置、电子设备及可读存储介质
CN113407114B (zh) * 2021-05-26 2022-03-04 青海师范大学 一种基于热数据和删除重复操作的在线扩容io调度方法
CN113721836A (zh) * 2021-06-15 2021-11-30 荣耀终端有限公司 一种数据去重方法及装置
CN114115734B (zh) * 2021-11-18 2024-10-25 新华三大数据技术有限公司 一种数据重删方法、装置、设备及存储介质
CN116204512A (zh) * 2021-12-01 2023-06-02 戴尔产品有限公司 数据管理系统和方法
CN114442931A (zh) * 2021-12-23 2022-05-06 天翼云科技有限公司 一种数据重删方法及系统、电子设备、存储介质
CN114518845B (zh) * 2022-01-06 2024-09-10 中汽创智科技有限公司 数据存储方法、装置、介质及设备
KR20230106915A (ko) * 2022-01-07 2023-07-14 삼성전자주식회사 스토리지 장치의 구동 방법, 및 스토리지 장치
US11777519B2 (en) 2022-02-10 2023-10-03 International Business Machines Corporation Partitional data compression
CN115048064B (zh) * 2022-07-29 2024-10-15 苏州浪潮智能科技有限公司 一种数据管理方法、装置、设备及存储介质
US12056093B2 (en) * 2022-10-25 2024-08-06 Dell Products L.P. Deduplication for cloud storage operations
CN115729471A (zh) * 2022-11-24 2023-03-03 郑州云海信息技术有限公司 一种重删查询方法、装置、设备和存储介质
CN118244969A (zh) * 2022-12-22 2024-06-25 戴尔产品有限公司 用于转移数据的方法、设备和计算机程序产品
CN117573026B (zh) * 2023-11-10 2025-05-20 中电云计算技术有限公司 重删服务自动开关方法、装置、设备及可读存储介质
CN119045747B (zh) * 2024-10-30 2025-05-16 苏州元脑智能科技有限公司 数据处理方法、计算机设备、存储介质和程序产品

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040215637A1 (en) * 2003-04-11 2004-10-28 Kenichi Kitamura Method and data processing system with data replication
CN103959259A (zh) * 2012-11-20 2014-07-30 华为技术有限公司 数据存储方法、数据存储装置及数据存储系统
CN104462388A (zh) * 2014-12-10 2015-03-25 上海爱数软件有限公司 一种基于级联式存储介质的冗余数据清理方法
CN104793901A (zh) * 2015-04-09 2015-07-22 北京鲸鲨软件科技有限公司 一种存储装置及存储方法
CN109918018A (zh) * 2017-12-13 2019-06-21 华为技术有限公司 一种数据存储方法及存储设备

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7478096B2 (en) * 2003-02-26 2009-01-13 Burnside Acquisition, Llc History preservation in a computer storage system
US7908436B1 (en) * 2008-04-25 2011-03-15 Netapp, Inc. Deduplication of data on disk devices using low-latency random read memory
US8452932B2 (en) * 2010-01-06 2013-05-28 Storsimple, Inc. System and method for efficiently creating off-site data volume back-ups
CN105786401A (zh) * 2014-12-25 2016-07-20 中国移动通信集团公司 服务器集群系统中的数据管理方法及装置
US10346297B1 (en) * 2016-03-31 2019-07-09 EMC IP Holding Company LLC Method and system for cloud based distributed garbage collection of a deduplicated datasets
EP3321792B1 (en) * 2016-09-28 2020-07-29 Huawei Technologies Co., Ltd. Method for deleting duplicated data in storage system, storage system and controller
CN109947731A (zh) * 2017-07-31 2019-06-28 星辰天合(北京)数据科技有限公司 重复数据的删除方法和装置
US11467967B2 (en) * 2018-08-25 2022-10-11 Panzura, Llc Managing a distributed cache in a cloud-based distributed computing environment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040215637A1 (en) * 2003-04-11 2004-10-28 Kenichi Kitamura Method and data processing system with data replication
CN103959259A (zh) * 2012-11-20 2014-07-30 华为技术有限公司 数据存储方法、数据存储装置及数据存储系统
CN104462388A (zh) * 2014-12-10 2015-03-25 上海爱数软件有限公司 一种基于级联式存储介质的冗余数据清理方法
CN104793901A (zh) * 2015-04-09 2015-07-22 北京鲸鲨软件科技有限公司 一种存储装置及存储方法
CN109918018A (zh) * 2017-12-13 2019-06-21 华为技术有限公司 一种数据存储方法及存储设备

Also Published As

Publication number Publication date
US20220236901A1 (en) 2022-07-28
CN115237347A (zh) 2022-10-25
EP4030273A1 (en) 2022-07-20
EP4030273A4 (en) 2022-11-23
US20240103747A1 (en) 2024-03-28
CN112684975B (zh) 2022-08-09
US11886729B2 (en) 2024-01-30
US12159046B2 (en) 2024-12-03
EP4030273B1 (en) 2025-05-21
CN112684975A (zh) 2021-04-20
US20250094070A1 (en) 2025-03-20

Similar Documents

Publication Publication Date Title
US12159046B2 (en) Data storage method and apparatus
CN103064639B (zh) 数据存储方法及装置
JP5965541B2 (ja) ストレージ装置及びストレージ装置の制御方法
CN106610790B (zh) 一种重复数据删除方法及装置
JP7323801B2 (ja) 情報処理装置および情報処理プログラム
CN112306974B (zh) 一种数据处理方法、装置、设备及存储介质
CN102609360A (zh) 一种数据处理方法、装置及系统
CN105493080B (zh) 基于上下文感知的重复数据删除的方法和装置
JPWO2014125582A1 (ja) ストレージ装置及びデータ管理方法
CN103049508B (zh) 一种数据处理方法及装置
US8924642B2 (en) Monitoring record management method and device
TW201937361A (zh) 儲存元件
CN110147203A (zh) 一种文件管理方法、装置、电子设备及存储介质
CN104750432B (zh) 一种数据存储方法及装置
WO2017042978A1 (ja) 計算機システム、ストレージ装置、及びデータの管理方法
CN107423425B (zh) 一种对k/v格式的数据快速存储和查询方法
CN114625695A (zh) 数据处理方法以及装置
CN115904263B (zh) 一种数据迁移方法、系统、设备及计算机可读存储介质
US12524172B2 (en) Data processing method and storage system
CN115687170A (zh) 一种数据处理方法、存储设备以及系统
CN115729846A (zh) 一种数据存储方法及装置
CN112000289A (zh) 全闪存储服务器系统数据管理方法及相关组件
CN115525219A (zh) 一种对象数据的存储方法、装置及介质
WO2023040305A1 (zh) 一种数据备份系统及装置
CN110413232A (zh) 一种磁盘io加速的方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20877442

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020877442

Country of ref document: EP

Effective date: 20220413

WWG Wipo information: grant in national office

Ref document number: 2020877442

Country of ref document: EP