CN102968356A - Data processing method of cloud storage system - Google Patents
Data processing method of cloud storage system Download PDFInfo
- Publication number
- CN102968356A CN102968356A CN2011104569412A CN201110456941A CN102968356A CN 102968356 A CN102968356 A CN 102968356A CN 2011104569412 A CN2011104569412 A CN 2011104569412A CN 201110456941 A CN201110456941 A CN 201110456941A CN 102968356 A CN102968356 A CN 102968356A
- Authority
- CN
- China
- Prior art keywords
- data
- frame
- error correction
- data block
- storage system
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003860 storage Methods 0.000 title claims abstract description 42
- 238000003672 processing method Methods 0.000 title claims abstract description 9
- 238000000034 method Methods 0.000 claims abstract description 18
- 238000013075 data extraction Methods 0.000 claims abstract description 4
- 239000011159 matrix material Substances 0.000 claims description 3
- 238000013500 data storage Methods 0.000 abstract description 6
- 235000014676 Phragmites communis Nutrition 0.000 abstract 2
- 230000015654 memory Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000014759 maintenance of location Effects 0.000 description 3
- 238000011084 recovery Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 239000002699 waste material Substances 0.000 description 2
- 208000011580 syndromic disease Diseases 0.000 description 1
Landscapes
- Detection And Correction Of Errors (AREA)
- Techniques For Improving Reliability Of Storages (AREA)
Abstract
Description
技术领域 technical field
本发明涉及云存储系统,特点是一种基于里的所罗门的云存储系统的数据处理方法。The invention relates to a cloud storage system, and is characterized by a data processing method based on the Solomon cloud storage system.
背景技术 Background technique
在当今风起云涌的时代,云存储作为云的基础架构和最广泛的应用得到了极大的重视。在云存储系统中,用户数据存放于系统的云端,而构成云端的存储节点是用户不可控的。用户的数据可能被未经授权的第三方挖掘比对或者恶意篡改。In today's turbulent era, cloud storage, as the infrastructure and the most extensive application of the cloud, has received great attention. In the cloud storage system, user data is stored in the cloud of the system, and the storage nodes that make up the cloud are beyond the control of the user. User data may be mined and compared or maliciously tampered with by unauthorized third parties.
同时,在云端单个或者多个存储节点缺失、失效的情况下(随着云端的扩展,存储节点故障的几率将增加),用户数据丢失的可能性极大。上述情况表明,云存储的发展亟需一种能完全保证用户数据完整性、隐私性和可靠性的安全机制。At the same time, when one or more storage nodes in the cloud are missing or fail (as the cloud expands, the probability of storage node failure will increase), the possibility of user data loss is extremely high. The above situation shows that the development of cloud storage urgently needs a security mechanism that can fully guarantee the integrity, privacy and reliability of user data.
目前基于所有的云存储技术都是采用一种类似Hadoop(云计算的一种开源软件)系统中的HDFS(Hadoop Distributed File System,即Hadoop分布式文件系统)。该技术主要是将一个数据文件按照设定的大小分成若干块,再通过对每一个数据完整备份(例如Hadoop中的HDFS设置有3份相同备份)来提高可靠性,该技术的缺点是浪费存储空间。At present, all cloud storage technologies are based on HDFS (Hadoop Distributed File System, that is, Hadoop Distributed File System) in a system similar to Hadoop (an open source software for cloud computing). This technology mainly divides a data file into several blocks according to the set size, and then improves reliability by fully backing up each data (for example, HDFS in Hadoop has 3 identical backups). The disadvantage of this technology is that it wastes storage space.
里的所罗门纠错编码方法:其原理是,计算信息码符多项式除以校验码生成多项式之后的余数,具体公式是:The Solomon error-correcting coding method in: the principle is to calculate the remainder after dividing the information code symbol polynomial by the check code generator polynomial, the specific formula is:
F mod D=C;F mod D = C;
其中F为原始数据,D是生成多项式,C为生成的冗余纠错数据。mod为求余运算。Among them, F is the original data, D is the generator polynomial, and C is the generated redundant error correction data. mod is the remainder operation.
而在解码时,为简单起见,假定存入原始的信息符号为m3、m2、m1、m0和由此产生的检验符号Q1、Q0,而读出的符号为m3′、m2′、m1′、m0′、Q1′和Q0′,如果计算由此得到的校正子s0和s1不全为0,则说明有差错,再通过计算错误多项式以及错误值,将错误纠正过来。In decoding, for the sake of simplicity, it is assumed that the stored original information symbols are m 3 , m 2 , m 1 , m 0 and the resulting check symbols Q 1 , Q 0 , and the read symbols are m 3 ′ , m 2 ′, m 1 ′, m 0 ′, Q 1 ′ and Q 0 ′, if the obtained syndromes s 0 and s 1 are not all 0, it means that there is an error, and then by calculating the error polynomial and the error value to correct the error.
该方法广泛应用于DVD光盘数据的处理,该编码方法能很好的提高对原始数据的纠错能力,能将数据的随机错误率从2×10-2降低到1×10-15。在本发明中我们将该编码方法将一个数组形式的数据块,在横向和纵向分别进行里的所罗门编码,获得横向和纵向纠错冗余数据,这样就对数据进行了双重的纠错保护,提高了纠错能力,而且这些冗余数据只占原来数据量的13%。This method is widely used in the processing of DVD disc data. This encoding method can improve the error correction ability of the original data very well, and can reduce the random error rate of the data from 2×10 -2 to 1×10 -15 . In the present invention, we use this coding method to carry out Solomon coding in a data block in the form of an array in the horizontal and vertical directions respectively to obtain horizontal and vertical error correction redundant data, so that the data has been doubled. Error correction protection, The error correction ability is improved, and these redundant data only account for 13% of the original data volume.
正是由于在较低的数据冗余的情况下能如此高效的进行数据纠错,而一般的云存储系统均未采取此种纠错方法,只是通过数据备份来达到数据恢复的能力,一般云存储系统均须备份3份及以上,显然将极大地浪费数据的存储空间,提高成本,It is precisely because data error correction can be performed so efficiently in the case of low data redundancy, and the general cloud storage system does not adopt this error correction method, but only achieves the ability of data recovery through data backup. All storage systems must back up 3 or more copies, which will obviously waste data storage space and increase costs.
发明内容 Contents of the invention
本发明要解决的技术问题在于,提供了一种云存储系统的数据处理方法,该方法将提高云存储系统中数据的安全性,可恢复性,降低数据备份数目,极大的节约数据的存储空间,降低成本。The technical problem to be solved by the present invention is to provide a data processing method for a cloud storage system, which will improve the security and recoverability of data in the cloud storage system, reduce the number of data backups, and greatly save data storage space and reduce costs.
本发明的技术解决方案如下:Technical solution of the present invention is as follows:
一种云存储系统的数据处理方法,其特点在于,对云存储系统的数据存入和数据提取分别采用里的所罗门编码数据纠错编码处理和里的所罗门纠错解码处理。A data processing method for a cloud storage system, which is characterized in that the data storage and data extraction of the cloud storage system adopt the Solomon coded data error correction encoding process and the Solomon error correction decoding process respectively.
所述的数据存入方法,包括以下具体步骤:The data storage method includes the following specific steps:
①将待存储的原始数据分成K个帧数据,每个帧数据包含相同的固定长度N比特的数据,当最后一个原始数据帧的数据长度小于N时,对最后一块数据帧填充数据‘0’,以达到长度N,其中K为大于大于1的正整数,N的取值范围为:200<N<2000:① Divide the original data to be stored into K frame data, each frame data contains the same fixed-length N-bit data, when the data length of the last original data frame is less than N, fill the last data frame with data '0' , to reach the length N, where K is a positive integer greater than or greater than 1, and the value range of N is: 200<N<2000:
②对所述的每一个数据帧加上编号,即ID,得到一个新数据帧,所述的ID长为4个字节,从0001开始递加,故新的数据帧的长度为(N+4);2. each data frame is added numbering, i.e. ID, obtains a new data frame, and described ID length is 4 bytes, starts incrementing from 0001, so the length of new data frame is (N+ 4);
③将所述的新的K个数据帧再组合为W个数据块,每一个数据块包含M个数据帧,形成一个M*(N+4)的数据矩阵,当最后一个数据块的数据帧的个数小于M时,对该数据块填充‘0’数据帧,以使最后一个数据块达到固定的数据帧M,其中M,W的取值范围分别为:200<M<2000,W=K/M;3. described new K data frames are recombined into W data blocks, each data block contains M data frames, forming a data matrix of M*(N+4), when the data frame of the last data block When the number is less than M, the data block is filled with '0' data frame, so that the last data block reaches a fixed data frame M, where the value ranges of M and W are: 200<M<2000, W= K/M;
④对所述的数据块采用里的所罗门乘积码纠错编码方法进行纠错编码:将第i个数据块的行和列分别添加P0,PI个纠错的冗余数据进行编码,转化成一个包含(M+P0)*(N+4+PI)个数据的里的所罗门数据块,其中,P0,PI分别为数据块中一行和一列增加的用于纠错的冗余数据个数,且0<P0<M/2,0<PI<M/2,1<i≤M;4. the Solomon product code error correction coding method in the described data block is adopted to carry out error correction coding: the row and column of the ith data block are respectively added P 0 , P I error-corrected redundant data are encoded, converted into a Solomon data block containing (M+P 0 )*(N+4+P I ) pieces of data, where P 0 and P I are the redundancy for error correction added to one row and one column in the data block respectively The number of remaining data, and 0<P 0 <M/2, 0<P I <M/2, 1<i≤M;
⑤将所述的每一个里的所罗门数据块按列分解成M+P0个数据片,将同一个数据块的M+P0个数据片的数据分别存储到云存储系统的多个存储器上,且同一存储器中,同一个数据块的片数≤PI/2片;⑤ Decompose each of the Solomon data blocks in each column into M+P 0 data slices, and store the data of the M+P 0 data slices of the same data block on multiple memories of the cloud storage system respectively , and in the same memory, the number of slices of the same data block ≤ P I /2 slices;
所述的数据提取方法,包括下列步骤:The described data extraction method comprises the following steps:
①读取出属于同一数据块的数据片,如果第i片中出现P0/2个数据读取错误,根据里的所罗门乘积码的纠错解码算法,对该第i片的数据纠错,恢复出原始数据;① Read the data slices belonging to the same data block. If there are P 0 /2 data reading errors in the i-th slice, according to the error correction decoding algorithm of the Solomon product code in the i-th slice, the data of the i-th slice is corrected. restore the original data;
②读取完同一数据块的所有的数据片,如果在读取该数据块时有少于PI/2片出现丢失或者无法读取,计算机按里的所罗门乘积码的解码算法,对该少于PI/2片数据纠错,恢复出原始数据;② After reading all the data slices of the same data block, if less than P I /2 slices are lost or cannot be read when reading the data block, the computer will decode the less than P I /2 slices according to the decoding algorithm of the Solomon product code in the computer Error correction in P I /2 pieces of data to restore the original data;
③重复步骤①、②,读取并处理完属于同一个原始数据的所有数据块,去掉纠错冗余数据,将所有数据块的新数据帧按原编号ID顺序排列,再去掉编号,得到原始存储的数据。③Repeat steps ① and ②, read and process all data blocks belonging to the same original data, remove error correction redundant data, arrange the new data frames of all data blocks in the order of the original number ID, and then remove the number to obtain the original stored data.
本发明的技术效果:Technical effect of the present invention:
1、本发明的最显著效果为,对原始数据块的行和列均进行了里的所罗门纠错编码,达到了双重纠错的能力,并且将这些经纠错后的数据块重新分片存储到云系统中的不同存储器上,这样不仅能对各个存储器上的片数据能进行纠错,并且当云系统中有一定数目的存储器出现故障造成一些片数据不能读取时,依然可得到完全的恢复,这极大的提高了系统的可靠性。1. The most remarkable effect of the present invention is that the row and column of the original data block are all subjected to the Solomon error correction coding, and the ability of double error correction is achieved, and these error-corrected data blocks are re-sharded for storage to different storages in the cloud system, so that not only can error correction be performed on the slice data on each storage, but also when a certain number of storages in the cloud system fail and some slice data cannot be read, complete recovery can still be obtained. Recovery, which greatly improves the reliability of the system.
2、一般的云存储系统需要将原始数据复制多份来确保数据的安全(一般云存储系统均须备份3份及以上),本发明可以减少在确保存储数据安全时使用的冗余存储量(可以只需备份两份或者一份),这将极大的节约的数据的存储空间,降低成本,使云存储系统空间得到更加充分的利用。2. The general cloud storage system needs to copy multiple copies of the original data to ensure the safety of the data (the general cloud storage system must back up 3 copies or more), the present invention can reduce the amount of redundant storage used when ensuring the security of the stored data ( You only need to back up two copies or one copy), which will greatly save data storage space, reduce costs, and make more full use of the cloud storage system space.
3、本发明的另一显著特点是,由于数据是被分散存储在多个存储器上的,所以当外部有人非法入侵云系统中某单一存储器时,所获得的数据都是不完整的,也就提高了在面对系统外部的非法入侵时,数据的安全性。3. Another notable feature of the present invention is that since the data is stored in multiple storages, when someone outside illegally invades a single storage in the cloud system, the obtained data will be incomplete, that is, Improved data security in the face of illegal intrusion from outside the system.
具体实施方式 Detailed ways
下面结合实例对本发明做进一步说明,但不应以此限制本发明的保护范围。Below in conjunction with example the present invention will be further described, but should not limit protection scope of the present invention with this.
本实例用于对一个100MB的原始数据进行云存储,具体实施步骤如下This example is used to store a 100MB original data in the cloud. The specific implementation steps are as follows
步骤一,将一个100MB的原始待存储的数据分成102401帧数据,每一帧数据为固定长度1020个比特的数据。Step 1: Divide a 100MB original data to be stored into 102401 frames of data, each frame of data has a fixed length of 1020 bits.
所述将待存储数据分为固定长度1020的102401帧数据,最后一块原始数据帧长度小于255时,对最后一块数据帧填充数据‘0’,以达到长度1020。The data to be stored is divided into 102401 frames of data with a fixed length of 1020. When the length of the last original data frame is less than 255, the last data frame is filled with data '0' to reach the length of 1020.
步骤二,对所述的每一个数据帧加上编号,即ID,得到新的数据帧。Step 2, adding a number, ie ID, to each of the data frames to obtain a new data frame.
所述将每一帧数据加上ID,ID长为4个字节,从1开始递加,实例中ID号从1到102401。The ID is added to each frame of data. The ID is 4 bytes long and increments from 1. In the example, the ID number is from 1 to 102401.
步骤三,将所述的数据帧再组合为若干个数据块,每一个数据块包含1024个数据帧,共得到100个这样的数据块。Step 3, recombine the data frame into several data blocks, each data block contains 1024 data frames, and a total of 100 such data blocks are obtained.
所述将数据帧再组合为数据块,这些数据帧将组成一个1024*1024的数据矩阵,形成一个数据块,如果最后一个数据块数据帧个数少于1024,对最后一块数据块填充数据‘0’,以达到固定数据块数1024。The data frames are recombined into data blocks, and these data frames will form a 1024*1024 data matrix to form a data block. If the number of data frames in the last data block is less than 1024, fill the last data block with data' 0', to reach a fixed data block number of 1024.
步骤四,通过里的所罗门纠错编码,将第i个包含1024*1024个数据的数据块的行和列分别进行编码,转化成一个包含(1024+PO)*(1024+PI)个数据的数据块,其中0<i<=M。PO=PI=100为数据块中一行和一列增加的用于纠错的冗余数据个数。具体公式是:Step 4: Encode the row and column of the i-th data block containing 1024*1024 data respectively through the Solomon error correction coding in , and convert it into a block containing (1024+PO)*(1024+PI) data Data block, where 0<i<=M. PO=PI=100 is the number of redundant data added for error correction in one row and one column in the data block. The specific formula is:
F mod D=C;F mod D = C;
本实施例中F为1024位数据,D是生成多项式,C为生成的100个冗余纠错数据。mod为求余运算。In this embodiment, F is 1024-bit data, D is a generator polynomial, and C is 100 redundant error correction data generated. mod is the remainder operation.
步骤五,将上述得到的每一个数据块按列分解成1024+100片,将同一个数据块得到的若干片数据分别存储到若干个云存储系统的存储器上,且同一存储器上中同一个数据块的片数不能多于50块。Step 5, decompose each data block obtained above into 1024+100 pieces by column, and store several pieces of data obtained from the same data block in the memory of several cloud storage systems, and the same data in the same memory The number of slices of a block cannot exceed 50 blocks.
步骤六,实例中当从云存储系统中读取数据时,通过RS-PC中的解码算法,解码出需要提取的数据。Step 6: In the example, when reading data from the cloud storage system, the data to be extracted is decoded through the decoding algorithm in the RS-PC.
所述从云存储系统中读取出数据,是指:The reading of data from the cloud storage system refers to:
1)读取出属于同一数据块的不同片,第i片中出现少于50个数据读取错误,根据里的所罗门纠错解码算法,可以将整片数据纠错,恢复成原始数据;1) Read different slices belonging to the same data block, and if there are less than 50 data reading errors in the i-th slice, according to the Solomon error correction decoding algorithm in it, the entire slice of data can be corrected and restored to the original data;
2)读取完同一数据块的不同片,如果在读取这些片时有少于50片出现丢失或者无法读取,根据里的所罗门纠错解码算法,可以将该段数据纠错,恢复出来。2) After reading different slices of the same data block, if less than 50 slices are lost or cannot be read when reading these slices, according to the Solomon error correction decoding algorithm in the program, the data of this segment can be corrected and recovered .
3)读取完所有属于同一个原始数据的不同数据块,去掉纠错冗余数据,将这些数据块按编号(1到102401)顺序排列,再去掉编号,最终得到原始存储的数据。3) After reading all the different data blocks belonging to the same original data, remove the error correction redundant data, arrange these data blocks in order of numbers (1 to 102401), and then remove the numbers, and finally obtain the original stored data.
Claims (3)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN2011104569412A CN102968356A (en) | 2011-12-30 | 2011-12-30 | Data processing method of cloud storage system |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN2011104569412A CN102968356A (en) | 2011-12-30 | 2011-12-30 | Data processing method of cloud storage system |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN102968356A true CN102968356A (en) | 2013-03-13 |
Family
ID=47798509
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN2011104569412A Pending CN102968356A (en) | 2011-12-30 | 2011-12-30 | Data processing method of cloud storage system |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN102968356A (en) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104579571A (en) * | 2015-01-15 | 2015-04-29 | 山东超越数控电子有限公司 | Data storage method based on LDPC encoding |
| CN108880620A (en) * | 2018-08-20 | 2018-11-23 | 广东石油化工学院 | Electric-power wire communication signal reconstructing method |
| CN110061802A (en) * | 2018-01-17 | 2019-07-26 | 中兴通讯股份有限公司 | Multi-user data transfer control method, device and data transmission set |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2005069775A2 (en) * | 2004-01-15 | 2005-08-04 | Sandbridge Technologies, Inc. | A method of reed-solomon encoding and decoding |
| CN101840377A (en) * | 2010-05-13 | 2010-09-22 | 上海交通大学 | Data storage method based on RS (Reed-Solomon) erasure codes |
| CN102006088A (en) * | 2010-10-08 | 2011-04-06 | 清华大学 | Interleaving and error-correcting method for reducing bit error rate of volume hologram storage system |
-
2011
- 2011-12-30 CN CN2011104569412A patent/CN102968356A/en active Pending
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2005069775A2 (en) * | 2004-01-15 | 2005-08-04 | Sandbridge Technologies, Inc. | A method of reed-solomon encoding and decoding |
| CN101840377A (en) * | 2010-05-13 | 2010-09-22 | 上海交通大学 | Data storage method based on RS (Reed-Solomon) erasure codes |
| CN102006088A (en) * | 2010-10-08 | 2011-04-06 | 清华大学 | Interleaving and error-correcting method for reducing bit error rate of volume hologram storage system |
Non-Patent Citations (1)
| Title |
|---|
| 刘小成等: "图像交织RS码设计及其C语言实现", 《微计算机信息》 * |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104579571A (en) * | 2015-01-15 | 2015-04-29 | 山东超越数控电子有限公司 | Data storage method based on LDPC encoding |
| CN110061802A (en) * | 2018-01-17 | 2019-07-26 | 中兴通讯股份有限公司 | Multi-user data transfer control method, device and data transmission set |
| CN108880620A (en) * | 2018-08-20 | 2018-11-23 | 广东石油化工学院 | Electric-power wire communication signal reconstructing method |
| CN108880620B (en) * | 2018-08-20 | 2021-06-11 | 广东石油化工学院 | Power line communication signal reconstruction method |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US9600365B2 (en) | Local erasure codes for data storage | |
| CN105260259B (en) | A kind of locality based on system minimum memory regeneration code repairs coding method | |
| Sima et al. | Optimal codes for the q-ary deletion channel | |
| CN100539444C (en) | Method and apparatus for embedding an additional error correction layer into an error correction code | |
| Mitzenmacher et al. | Biff (Bloom filter) codes: Fast error correction for large data sets | |
| CN105356968B (en) | The method and system of network code based on cyclic permutation matrices | |
| CN108347306B (en) | Similar local reconstruction code encoding and node fault repairing method in distributed storage system | |
| CN110895497B (en) | Method and device for reducing erasure code repair in distributed storage | |
| CN107844272A (en) | A kind of cross-packet coding and decoding method for improving error correcting capability | |
| CN101840366A (en) | Storage method of loop chain type n+1 bit parity check code | |
| CN112000512B (en) | A data restoration method and related device | |
| CN110532126A (en) | Data fast recovery method, device and storage medium of erasure code storage system | |
| CN116501553A (en) | Data recovery method, device, system, electronic equipment and storage medium | |
| CN103703446B (en) | Data reconstruction that network storage Zhong Kang Byzantium lost efficacy, failure-data recovery method and device | |
| CN105356892B (en) | The method and system of network code | |
| CN115793984B (en) | Data storage method, device, computer equipment and storage medium | |
| CN117271199A (en) | Code generation, encoding and decoding methods and devices | |
| CN112181707B (en) | Distributed storage data recovery scheduling method, system, device and storage medium | |
| CN102968356A (en) | Data processing method of cloud storage system | |
| WO2017041232A1 (en) | Encoding and decoding framework for binary cyclic code | |
| CN115269258A (en) | Method and system for data recovery | |
| Huang et al. | An improved decoding algorithm for generalized RDP codes | |
| CN108614749A (en) | A kind of data processing method and device | |
| CN105245314B (en) | Hybrid redundant fault-tolerant encoding and decoding method and system in distributed storage system | |
| CN117194095B (en) | Changing error correction configuration |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| C05 | Deemed withdrawal (patent law before 1993) | ||
| WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20130313 |