WO2016074370A1 - 一种KeyValue数据库的数据表的更新方法与表数据更新装置 - Google Patents

一种KeyValue数据库的数据表的更新方法与表数据更新装置 Download PDF

Info

Publication number
WO2016074370A1
WO2016074370A1 PCT/CN2015/073211 CN2015073211W WO2016074370A1 WO 2016074370 A1 WO2016074370 A1 WO 2016074370A1 CN 2015073211 W CN2015073211 W CN 2015073211W WO 2016074370 A1 WO2016074370 A1 WO 2016074370A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
full
row
file
deletion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2015/073211
Other languages
English (en)
French (fr)
Inventor
郭益君
毕杰山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201580000911.7A priority Critical patent/CN105900093B/zh
Priority to AU2015316450A priority patent/AU2015316450B2/en
Priority to JP2016519954A priority patent/JP6251388B2/ja
Priority to CA2922388A priority patent/CA2922388C/en
Priority to EP15832861.7A priority patent/EP3051440B1/en
Priority to US15/054,475 priority patent/US10467192B2/en
Publication of WO2016074370A1 publication Critical patent/WO2016074370A1/zh
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • G06F16/162Delete operations
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating

Definitions

  • the present invention relates to the field of information technology, and in particular, to a method for updating a data table of a KeyValue database and a device for updating a table data.
  • the KeyValue type database (hereinafter referred to as KeyValue database) is a non-relational (NoSQL) type distributed storage database with high scalability and high reliability, and has been widely used in more and more systems. use.
  • Many KeyValue database data is stored in table units. Each table includes multiple rows of data. Each row of data is uniquely identified by a row identifier (RowKey), and each row of data includes multiple columns of attributes, and each column attribute corresponds to a KeyValue data. And each column attribute has a data type and a timestamp.
  • the data type includes Put (new), Delete (delete) and other types, the Put type is used to indicate that the attribute is a new attribute, and the Delete type is used to indicate that the attribute is used to delete an old attribute.
  • the timestamp is used to indicate when each attribute was generated.
  • the KeyValue database uses timestamps to implement multi-version preservation of data. That is, data of the same RowKey value is differentiated into new and old data by timestamp. When the KeyValue database includes multiple versions of data, the old version of the data will be overwritten by the new version of the data, the user will directly read the new version of the data when reading the data.
  • the KeyValue database uses the mark deletion technique when updating data, which is used to delete or import KeyValue data one by one. In practical applications, it is often necessary to perform a full update of the entire data of the table data, that is, all the existing data in the table needs to be emptied and imported into new data, but the existing KeyValue database does not support the one-time full amount of the table data. Update. If the table data is deleted and imported one by one, the process of updating all the data in the table takes a long time, and the update process lacks atomicity, which affects the quality of the data reading service provided by the table. Therefore, provide a suitable for KeyValue The method of updating the full amount of table data of the database is of great significance.
  • the embodiment of the invention provides a method for updating the data table of the KeyValue database, which can implement full update of the table data in the KeyValue database.
  • a first aspect of the embodiments of the present invention provides a method for updating a data table of a KeyValue database, including:
  • the full data update file includes P rows of new data, and each row of new data includes a row identifier RowKey and Q Column new attribute, the data type of each column new attribute is input Put type, and each column new attribute is set with update timestamp;
  • each row of old data includes a RowKey and an N column old attribute, and each column old attribute is set with an original time stamp;
  • the full data deletion file includes M rows of deletion data, and each row of deletion data corresponds to the old data of each row, and each row deletes data.
  • the data type is Delete Delete type
  • the deletion data of each row is set with a deletion timestamp
  • the deletion timestamp of the deleted data of the Rth row is greater than the maximum value of the original timestamp of the old data of the Rth row
  • the deletion time of the deleted data of the Rth row The stamp is smaller than the minimum value of the update timestamp of the new data of the Sth row
  • the Rth row delete data, the Rth row old data and the Sth row new data have the same RowKey, and 1 ⁇ R ⁇ M, 1 ⁇ S ⁇ P;
  • the step of generating a full data deletion file is performed first, and then the step of generating a full data update file is performed;
  • the time of updating the time stamp of each new attribute is the time when the full data update file is generated, and the time of deleting the time stamp of the deleted data is the time when the full data deletion file is generated.
  • the performing the full data update file is first introduced into the In the step of the data table, the step of importing the full data deletion file into the data table is performed.
  • the full data update file is imported
  • the data sheet includes:
  • the save path of the full data update file is changed to the directory of the data table corresponding to the full data update instruction.
  • the data deletion file is imported into the data table including:
  • the save path of the full data deletion file is changed to a directory of the data table corresponding to the full data update instruction.
  • a second aspect of the embodiments of the present invention provides a table data updating apparatus, including:
  • a receiving module configured to receive a full amount of data update instructions
  • a file generating module configured to acquire data to be imported according to the full data update instruction, and generate a full data update file according to the data to be imported, where the full data update file includes P rows of new data, and each row of new data includes A row identifies the new attributes of the RowKey and Q columns.
  • the data type of each column new attribute is the input Put type, and each column new attribute is set with an update timestamp;
  • An acquiring module configured to acquire M rows of old data of the data table corresponding to the full data update instruction, where each row of old data includes a RowKey and an N column old attribute, and each column old attribute is set with an original timestamp;
  • the file generating module is further configured to generate a full data deletion file according to the M rows of old data, where the full data deletion file includes M rows of deleted data, and each row of deleted data has a one-to-one correspondence with the old data of each row.
  • the data type of the deleted data in each row is a Delete type, and the deleted data of each row is set with a deletion timestamp, and the deletion timestamp of the deleted data of the Rth row is greater than the maximum value of the original timestamp of the old data of the Rth row, the Rth
  • the deletion timestamp of the row deletion data is smaller than the minimum value of the update timestamp of the new data of the Sth row, the Rth row deletion data, the Rth row old data, and the Sth row new
  • the data has the same RowKey, and 1 ⁇ R ⁇ M, 1 ⁇ S ⁇ P;
  • an import module configured to import the full data update file into the data table, and import the full data deletion file into the data table.
  • the file generating module is configured to delete the full data deletion file, and then generate the full data update file.
  • the time at which the full data update file is generated is set as the update time stamp of each new attribute of the column, and the time at which the full data deletion file is generated is set as the deletion time stamp of the deleted data per line.
  • the importing module first imports the full data update file
  • the data table further imports the full data deletion file into the data table.
  • the first or second implementation manner of the second aspect in the third implementation manner of the second aspect of the embodiment, is specifically configured to:
  • the save path of the full data update file is changed to the directory of the data table corresponding to the full data update instruction.
  • the second import module Specifically used for:
  • the save path of the full data deletion file is changed to a directory of the data table corresponding to the full data update instruction.
  • a third aspect of the embodiments of the present invention provides a table data updating apparatus, including an input device, an output device, a processor, and a memory, wherein the processor is configured to perform the following steps by calling an operation instruction stored in a memory:
  • the full data update file includes P rows of new data, and each row of new data includes a row identifier RowKey and Q Column new attribute, the data type of each column new attribute is input Put type, and each column new attribute is set with update timestamp;
  • the full data deletion file includes M rows of deletion data, and each row of deletion data corresponds to the old data of each row, and each row deletes data.
  • the data type is Delete type
  • the deletion data of each row is set with a deletion timestamp
  • the deletion timestamp of the deleted data of the Rth row is greater than the maximum value of the original timestamp of the old data of the Rth row
  • the deletion timestamp of the deleted data of the Rth row a minimum value of the update time stamp smaller than the new data of the Sth row
  • the Rth row delete data, the Rth row old data and the Sth row new data have the same RowKey, and 1 ⁇ R ⁇ M,1 ⁇ S ⁇ P;
  • the processor is configured to delete the full data deletion file, and then generate the full data update file,
  • the time at which the full amount of data update file is generated is set as the update timestamp of the new attribute of each column, and the time at which the full amount of data deletion file is generated is set as the deletion timestamp of the deleted data of each line.
  • the processor first imports the full data update file
  • the data table further imports the full data deletion file into the data table.
  • the processor is further configured to:
  • the save path of the full data update file is changed to the directory of the data table corresponding to the full data update instruction.
  • the processor is further used For execution:
  • the save path of the full data deletion file is changed to a directory of the data table corresponding to the full data update instruction.
  • a fourth aspect of the embodiments of the present invention provides a method for updating a data table of a KeyValue database, including:
  • the full data update file includes P rows of new data, and each row of new data includes a row identifier RowKey and Q Column new attribute, the data type of each column new attribute is input Put type, and each column new attribute is set with update timestamp;
  • each row of old data includes a RowKey and an N column old attribute, and each column old attribute is set with an original time stamp;
  • the full data deletion file includes M rows of deletion data, and each row of deletion data corresponds to the old data of each row, and each row deletes data.
  • the data type is Delete Delete type
  • the deletion data of each row is set with a deletion timestamp
  • the deletion timestamp of the deleted data of the Rth row is greater than the maximum value of the original timestamp of the old data of the Rth row
  • the deletion time of the deleted data of the Rth row The stamp is smaller than the minimum value of the update timestamp of the new data of the Sth row
  • the Rth row delete data, the Rth row old data and the Sth row new data have the same RowKey, and 1 ⁇ R ⁇ M, 1 ⁇ S ⁇ P;
  • the table data update file is imported into the data table.
  • the importing the table data update file into the data table includes:
  • the save path of the table data update file is changed to a directory of the data table corresponding to the full data update command.
  • a fifth aspect of the embodiments of the present invention provides a table data updating apparatus, including:
  • An instruction receiving module configured to receive a full data update instruction
  • Generating a file module configured to acquire data to be imported according to the full data update instruction, and generate a full data update file according to the data to be imported, where the full data update file includes P rows of new data, and each row of new data includes A row identifies the new attributes of the RowKey and Q columns.
  • the data type of each column new attribute is the input Put type, and each column new attribute is set with an update timestamp;
  • a data obtaining module configured to acquire M rows of old data of the data table corresponding to the full data update instruction, wherein each row of old data includes a RowKey and an N column old attribute, and each column old attribute is set with an original time stamp;
  • the generating file module is further configured to generate a full data deletion file according to the M rows of old data, where the full data deletion file includes M rows of deleted data, and each row of deleted data has a one-to-one correspondence with the old data of each row.
  • the data type of the deleted data in each row is a Delete type, and the deleted data of each row is set with a deletion timestamp, and the deletion timestamp of the deleted data of the Rth row is greater than the maximum value of the original timestamp of the old data of the Rth row, the Rth
  • the deletion timestamp of the row deletion data is smaller than the minimum value of the update timestamp of the new data of the Sth row, the Rth row deletion data, the Rth row old data and the Sth row new data have the same RowKey, and 1 ⁇ R ⁇ M, 1 ⁇ S ⁇ P;
  • a file merging module configured to merge the full data update file and the full data deletion file into a table data update file
  • a file import module for importing the table data update file into the data table.
  • the file importing module is specifically configured to:
  • the save path of the table data update file is changed to a directory of the data table corresponding to the full data update command.
  • a sixth aspect of the embodiments of the present invention provides a table data updating apparatus, including an input device, an output device, a processor, and a memory, wherein the processor is configured to perform the following steps by calling an operation instruction stored in a memory:
  • the full data update file includes P rows of new data, and each row of new data includes a row identifier RowKey and Q Column new attribute, the data type of each column new attribute is input Put type, and each column new attribute is set with update timestamp;
  • each row of old data includes a RowKey and an N column old attribute, and each column old attribute is set with an original time stamp;
  • the full data deletion file includes M rows of deletion data, and each row of deletion data corresponds to the old data of each row, and each row deletes data.
  • the data type is Delete type
  • the deletion data of each row is set with a deletion timestamp
  • the deletion timestamp of the deleted data of the Rth row is greater than the maximum value of the original timestamp of the old data of the Rth row
  • the deletion timestamp of the deleted data of the Rth row The minimum value of the update timestamp of the new data smaller than the Sth line
  • the Rth row delete data, the Rth row old data and the Sth row new data have the same RowKey, and 1 ⁇ R ⁇ M, 1 ⁇ S ⁇ P;
  • the table data update file is imported into the data table.
  • the processor is further configured to:
  • the save path of the table data update file is changed to a directory of the data table corresponding to the full data update command.
  • An embodiment of the present invention provides a method for updating a data table of a KeyValue database, including: receiving a full data update instruction, acquiring data to be imported according to the full data update instruction, and generating a full data update file according to the data to be imported; The M row old data of the data table corresponding to the data update instruction, and the full data deletion file is generated according to the M line old data; the full data update file is imported into the data table; and the full data deletion file is imported into the data table.
  • the method provided by the embodiment of the present invention is faster than the KeyValue data in the data table by the prior art. Good atomicity.
  • FIG. 1 is a flowchart of an embodiment of a method for updating a data table of a KeyValue database according to an embodiment of the present invention
  • FIG. 2 is a flow chart of another embodiment of a method for updating a data table of a KeyValue database according to an embodiment of the present invention
  • FIG. 3 is a structural diagram of an embodiment of a table data updating apparatus according to an embodiment of the present invention.
  • FIG. 4 is a structural diagram of another embodiment of a table data updating apparatus according to an embodiment of the present invention.
  • FIG. 5 is a flowchart of another embodiment of a method for updating a data table of a KeyValue database according to an embodiment of the present invention
  • FIG. 6 is a structural diagram of another embodiment of a table data updating apparatus according to an embodiment of the present invention.
  • the embodiment of the invention provides a method for updating the data table of the KeyValue database, which can implement full update of the table data in the KeyValue database.
  • the present invention also proposes related table data updating means, which will be separately described below.
  • FIG. 1 For the basic process of updating the data table of the KeyValue database provided by the embodiment of the present invention, please refer to FIG. 1 , which mainly includes:
  • the user sends a full data update instruction through the client, and the full data update instruction is used to indicate that the table data of the specified data table in the KeyValue database is updated in full.
  • the table data updating means receives the full amount of data update instruction.
  • the table data updating device acquires data to be imported according to the full data update instruction, and generates a full data update file according to the data to be imported.
  • the full data update instruction may include a table name of the data table, and specify a save path of the data to be imported and a save path of the generated full data update file.
  • the table data updating device acquires the data to be imported from the storage path of the data to be imported specified by the full data update instruction, generates a full data update file according to the data to be imported, and then saves the full data update file specified by the full data update instruction. Under the save path.
  • the full data update file includes P rows of new data, each row of new data includes a row identifier RowKey and Q column new attributes, the data type of each column new attribute is an input Put type, and each column new attribute is set with an update timestamp.
  • the table data updating device acquires M rows of old data of the data table corresponding to the full data update instruction, wherein each row of old data includes a RowKey and N column old attributes, and each column old attribute is set with an original time stamp.
  • the table data updating means generates a full amount of data deletion file based on the M line old data.
  • the full data update instruction may include a table name of the data table, and specify a save path of the generated full data deletion file.
  • the table data updating means determines the data table based on the table name of the data table, and then stores it in the save path of the full data deletion file specified by the full data update instruction.
  • the full data deletion file includes M rows of deleted data, and each row of deleted data corresponds to each row of old data of the data table, and the data type of each row of deleted data is Delete type, and each row of deleted data is deleted for the same as its RowKey. Old data. Among them, each line delete data is set with a delete timestamp.
  • the step 103 is also located before the step 102, which is not limited in this embodiment. However, regardless of the order of step 103 and step 102, it is necessary to ensure that the deletion timestamp of the deleted data of the Rth row is greater than the maximum value of the original timestamp of the old data of the Rth row, and the deletion timestamp of the deleted data of the Rth row is smaller than the Sth. The minimum value of the update timestamp in the new row of data.
  • the R row delete data, the R row old data and the S row new data have the same RowKey, and 1 ⁇ R ⁇ M, 1 ⁇ S ⁇ P.
  • the table data update device imports the full data update file into the data table corresponding to the full data update command, wherein the full data update file is once imported into the data table, and the atomicity of data import can be ensured.
  • the minimum value of the update timestamp in the new data of the Sth line of the full data update file is greater than the maximum value of the original timestamp in the old data of the Rth row which is the same as the RowKey, so in the full data update file import data table After the old data of the Rth row is present, the new data of the Sth row can overwrite the old data of the Rth row, and the user can read the new data of the Sth row, but cannot read the old data of the Rth row.
  • the table data updating device imports the full data deletion file into the data table corresponding to the full data update instruction, wherein the full data deletion file is once imported into the data table, and the atomicity of the data deletion can be ensured.
  • the deleted data of the Rth row can delete the old data of the Rth row which is the same as the RowKey.
  • the full data deletion file includes the M line deletion data corresponding to the RowKey of the M line old data of the data table one by one, after the full data deletion file is imported into the data table, the M line old data in the data table is completely deleted. The user cannot read the old data in the data table.
  • the deletion timestamp of the deleted data of the Rth row is smaller than the minimum value of the update timestamp in the new data of the Sth row which is the same as the RowKey, if there is new data of the Sth row, then The new data of the S line is still valid, that is, after the full data deletion file is imported into the data table, the full data update file can still be read normally by the user.
  • the step 105 is also located before the step 104, which is not limited in this embodiment.
  • the embodiment provides a method for updating a data table of a KeyValue database, comprising: receiving a full data update instruction, acquiring data to be imported according to the full data update instruction, and generating a full data update file according to the data to be imported; acquiring full data.
  • the M line old data of the data table corresponding to the instruction is updated, and the full data deletion file is generated according to the M line old data; the full data update file is imported into the data table; and the full data deletion file is imported into the data table.
  • the order of steps 102 and 103 is not limited, and the order of steps 104 and 105 is not limited. However, in an actual application, changing the order of the steps may further produce a beneficial effect.
  • the flow of the method for updating the data table of another KeyValue database provided by the embodiment of the present invention includes:
  • the user sends a full data update instruction through the client, and the full data update instruction is used to indicate that the table data of the data table in the KeyValue database is updated in full.
  • the table data updating means receives the full amount of data update instruction.
  • the full data update instruction is used to indicate that the old data of the data table in the KeyValue database is deleted and new data is imported into the data table. Therefore, the full data update instruction may include a delete instruction and an update instruction.
  • the delete instruction is used to indicate the deletion of the old data of the data table in the KeyValue database, and the update instruction is used to instruct the import of new data into the data table.
  • the table data updating device acquires M rows of old data of the data table corresponding to the full data update instruction according to the full data update instruction, wherein each row of old data includes a RowKey and an N column old attribute, and each column old attribute is set with an original time stamp.
  • the table data updating device generates the full amount based on the M line old data. According to the deletion of the file.
  • the full data update instruction may include a table name of the data table, and specify a save path of the generated full data deletion file.
  • the table data updating means determines the data table based on the table name of the data table, and then stores it in the save path of the full data deletion file specified by the full data update instruction.
  • the full data deletion file includes M rows of deleted data, and each row of deleted data corresponds to each row of old data of the data table, and the data type of each row of deleted data is Delete type, and each row of deleted data is deleted for the same as its RowKey. Old data.
  • the deletion time of each row is set with a deletion timestamp, and the time of deleting the timestamp of the deleted data of each row is the time at which the full data deletion file is generated, and the timestamp is greater than the maximum value of the original timestamp.
  • the table data update means performs the operation in this step according to the delete instruction.
  • the table data update device acquires the data to be imported according to the full data update instruction, and generates a full data update file according to the data to be imported.
  • the full data update instruction may include a table name of the data table, and specify a save path of the data to be imported and a save path of the generated full data update file.
  • the table data updating device acquires the data to be imported from the storage path of the data to be imported specified by the full data update instruction, generates a full data update file according to the data to be imported, and then saves the full data update file specified by the full data update instruction. Under the save path.
  • the full data update file includes P rows of new data, each row of new data includes a row identifier RowKey and Q column new attributes, the data type of each column new attribute is an input Put type, and each column new attribute is set with an update timestamp, and The time of the update timestamp of each new attribute is the time at which the full data update file is generated, and the update timestamp is greater than the maximum value of the original timestamp.
  • the table data update apparatus performs the operation in this step according to the update instruction.
  • the table data update device imports the full data update file into the data table corresponding to the full data update command, wherein the full data update file is once imported into the data table, thereby ensuring data import.
  • Atomic
  • the table data updating device may change the save path of the full amount of the data update file to the directory of the data table to implement importing the full amount of the data update file into the data table.
  • the operation of changing the save path takes only the second level.
  • the update timestamp is greater than the maximum value of the original timestamp, so after the full amount of data update file is imported into the data table, the Sth line
  • the new data can cover the old data of the Rth row, and the user can read the new data of the Sth row, but cannot read the old data of the Rth row.
  • the old data of the Rth row and the new data of the Sth row have the same RowKey, and 1 ⁇ R ⁇ M, 1 ⁇ S ⁇ P.
  • the table data updating device imports the full data deletion file into the data table corresponding to the full data update instruction, wherein the full data deletion file is once imported into the data table, and the atomicity of the data deletion can be ensured.
  • the table data updating device may change the save path of the full data deletion file to the directory of the data table to implement importing the full data deletion file into the data table.
  • the operation of changing the save path takes only the second level.
  • the time of deleting the timestamp of deleting data in each row is the time of generating the full data deletion file, and the deletion timestamp is greater than the maximum value of the original timestamp, so after the full data update file is imported into the data table, the Rth row Deleting data can delete the same old R-th data as its RowKey.
  • the full data deletion file includes the M line deletion data corresponding to the RowKey of the M line old data of the data table one by one, after the full data deletion file is imported into the data table, the M line old data in the data table is completely deleted. The user cannot read the old data in the data table.
  • each deletion timestamp is smaller than each update timestamp, and the full data deletion file does not delete the data in the full data update file, and the full data update file can still be read normally by the user.
  • step 204 is performed first, after the full data deletion file is imported into the data table, the full data update file is imported into the data table.
  • step 205 is performed first, and then step 205 is performed, so that when the data table of the KeyValue database is updated, data in the data table can always be read by the user without interrupting the service.
  • the KeyValue database in the embodiment of the present invention may use a distributed file system as a medium for storing the underlying data.
  • a distributed file system is a file management system in which physical storage resources are not necessarily connected to a local node, but are connected to multiple nodes through a computer network. In a distributed file system, large data blocks are divided into multiple small data blocks and stored on multiple nodes, so that the distributed file system has high fault tolerance and throughput.
  • distributed file systems including the Hadoop Distributed File System (HDFS), which are not limited in the embodiment of the present invention, and are only illustrated by HDFS.
  • HDFS Hadoop Distributed File System
  • the table data update device After generating the full data update file and the full data deletion file, the table data update device stores the full data update file and the full data deletion file at a location outside the directory of the KeyValue database in the HDFS.
  • the save path of the full data update file is changed to the directory of the data table in the KeyValue database in the HDFS; in step 205, the save path of the full data delete file is changed to the data table in the KeyValue database in HDFS. Under the directory.
  • the embodiment provides a method for updating a data table of a KeyValue database, comprising: receiving a full data update instruction; generating a full data update file according to the data to be imported; and acquiring data to be imported according to the full data update instruction; acquiring full data
  • the M line old data of the data table corresponding to the instruction is updated, and the full data deletion file is generated according to the M line old data; the full data update file is imported into the data table; and the full data deletion file is imported into the data table.
  • the embodiment of the present invention provides The method is better in atomicity and faster in updating. Moreover, since the method provided by the embodiment of the present invention first imports the full data update file, and then imports the full data deletion file, the data in the target table does not have the old data deleted but the data has not been imported, so there is no need to interrupt the data. Read the service and the user experience is better.
  • the KeyValue database uses HDFS as the medium for the underlying data storage.
  • the user issues a full data update instruction through the client, and specifies the data in the update data table Table1.
  • the form of the data table Table1 is as shown in Table 1.
  • Table 1 only represents a logical structure of the data table Table1, the underlying file is in the form of KeyValue, each attribute of each row corresponds to a KeyValue, and each KeyValue includes a RowKey value, a data type and an original timestamp, and may also include other information.
  • the data table Table1 includes 100 rows of old data, and each of the 100 rows of old data includes a RowKey and three columns of attributes (attribute 1, attribute 2, attribute 3), and each column attribute is set with an original timestamp T0. Among them, the data type of each column attribute is Put.
  • the table data updating means acquires 100 lines of old data of Table 1 based on the full amount data update instruction.
  • the table data updating means generates a full amount of data deletion file based on the 100 lines of old data.
  • the full data deletion file includes 100 rows of deleted data corresponding to 100 rows of old data of the data table Table1, and the 100 rows of deleted data are in one-to-one correspondence with the RowKey of the 100 rows of old data, and the 100 rows of deleted data are used to delete the data. 100 lines of old data.
  • the form of 100 rows of deleted data of the full data deletion file may be as shown in Table 2(a).
  • Table 2 (a) only represents a logical structure of the full data deletion file, the underlying file is in the form of KeyValue, each row corresponds to a KeyValue, and each KeyValue includes a RowKey value, a data type, and a deletion timestamp, and may also include other information.
  • the data type of each row of deleted data is Delete, and the deletion timestamp is set, and the time of deleting the timestamp of each row is deleted.
  • the time T1, T1 of the full data deletion file is greater than T0.
  • the form of 100 rows of deleted data of the full data deletion file may also be as shown in Table 2(b).
  • Table 2 (b) only represents a logical structure of the full data deletion file, the underlying file is in the form of KeyValue, each attribute of each row corresponds to a KeyValue, and each KeyValue includes a RowKey value, a data type, and a deletion time stamp. , can also include other information.
  • Each row of deleted data has three attributes corresponding to Table1, each of which has a data type of Delete and is set with a deletion timestamp, and the time of deleting the timestamp of each attribute is the time T1 at which the full data deletion file is generated. T1 is greater than T0.
  • Table 2(a) only the full data deletion file shown in Table 2(a) will be described as an example.
  • the table data updating device acquires the data to be imported according to the full data update instruction, and generates a full data update file according to the data to be imported, as shown in Table 3.
  • Table 3 only represents a logical structure of the full data update file.
  • the underlying file is in the form of KeyValue. Each attribute of each row corresponds to a KeyValue.
  • Each KeyValue includes a RowKey value, a data type, and an update timestamp. Includes other information.
  • the full data update file includes 200 rows of new data, each row of new data includes a row identifier RowKey and 3 columns of new attributes (attribute 1, attribute 2, attribute 4), and the data type of each column new attribute is input Put type, each The column new attribute is set with an update timestamp, and the time of the update timestamp of each column new attribute is the time T2 at which the full data update file is generated, and T2 is greater than T1.
  • the table data updating means changes the storage path of the full amount of data update file to the directory of the data table Table1, so as to import the full amount of data update file into the data table.
  • the table data update device changes the save path of the full data deletion file to the directory of the data table Table1, so as to import the full data deletion file into the data table, so that the data of the full data update file is valid, but in the data table. Old data is completely deleted.
  • T4 is greater than T3.
  • the embodiment of the present invention further provides a related table data updating device.
  • the basic structure of the device is as shown in FIG. 3, which mainly includes:
  • the receiving module 301 is configured to receive a full data update instruction
  • the user sends a full data update instruction through the client, and the full data update instruction is used to indicate the full Update the table data of the data table in the KeyValue database.
  • the receiving module 301 receives the full data update instruction.
  • the file generating module 302 is configured to acquire data to be imported according to the full data update instruction, and generate a full data update file according to the data to be imported;
  • the file generation module 302 acquires data to be imported according to the full data update instruction, and generates a full data update file according to the data to be imported.
  • the full data update instruction may include a table name of the data table, and specify a save path of the data to be imported and a save path of the generated full data update file.
  • the file generation module 302 acquires the data to be imported from the save path of the data to be imported specified by the full data update instruction, generates a full data update file according to the data to be imported, and then saves the full data update file specified by the full data update instruction. Under the save path.
  • the full data update file includes P rows of new data, each row of new data includes a row identifier RowKey and Q column new attributes, the data type of each column new attribute is an input Put type, and each column new attribute is set with an update timestamp.
  • the obtaining module 303 is configured to acquire M rows of old data of the data table corresponding to the full data update instruction;
  • the old data of each line includes a RowKey and N column old attributes, and the old attribute of each column is set with the original timestamp.
  • the file generating module 302 is further configured to generate a full data deletion file according to the M line old data.
  • the full data update instruction may include a table name of the data table, and specify a save path of the generated full data deletion file.
  • the file generation module 302 determines the data table based on the table name of the data table, and then stores it in the save path of the full data deletion file specified by the full data update instruction.
  • the full data deletion file includes M rows of deleted data, and each row of deleted data corresponds to each row of old data of the data table, and the data type of each row of deleted data is Delete type, and each row of deleted data is deleted for the same as its RowKey. Old data. Among them, each line delete data is set with a delete timestamp.
  • the deletion timestamp of the deleted data of the Rth row is greater than the maximum value of the original timestamp of the old data of the Rth row, and the deletion timestamp of the deleted data of the Rth row is smaller than the minimum of the updated timestamp of the new data of the Sth row.
  • the R row delete data, the R row old data and the S row new data have the same RowKey, and 1 ⁇ R ⁇ M, 1 ⁇ S ⁇ P.
  • the import module 304 is configured to import the full data update file into the data table, and delete the full amount of data. In addition to the file import data table;
  • the import module 304 imports the full data update file into the data table corresponding to the full data update command, wherein the full data update file is once imported into the data table, and the atomicity of the data import can be guaranteed.
  • the minimum value of the update timestamp in the new data of the Sth line of the full data update file is greater than the maximum value of the original timestamp in the old data of the Rth row which is the same as the RowKey, so in the full data update file import data table After the old data of the Rth row is present, the new data of the Sth row can overwrite the old data of the Rth row, and the user can read the new data of the Sth row, but cannot read the old data of the Rth row.
  • the import module 304 imports the full data deletion file into the data table corresponding to the full data update instruction, wherein the full data deletion file is once imported into the data table, and the atomicity of the data deletion can be guaranteed.
  • the deleted data of the Rth row can delete the old data of the Rth row which is the same as the RowKey.
  • the full data deletion file includes the M line deletion data corresponding to the RowKey of the M line old data of the data table one by one, after the full data deletion file is imported into the data table, the M line old data in the data table is completely deleted. The user cannot read the old data in the data table.
  • the deletion timestamp of the deleted data of the Rth row is smaller than the minimum value of the update timestamp in the new data of the Sth row which is the same as the RowKey, if the new data of the Sth row exists, the new data of the Sth row is still valid, that is, After the full data deletion file is imported into the data table, the full data update file can still be read by the user.
  • the embodiment of the present invention provides a table data updating apparatus, including: a receiving module 301, configured to receive a full amount of data update instructions; and a file generating module 302, configured to acquire data to be imported according to the full amount of data update instructions, and according to the data to be imported. Generating a full amount of data update file; the obtaining module 303 is configured to obtain the M line old data of the data table corresponding to the full data update instruction; the file generating module 302 is further configured to generate a full data deletion file according to the M line old data; and the import module 304 , used to import the full amount of data update files into the data table, and import the full amount of data deletion files into the data table.
  • the table data updating apparatus provided in this embodiment can implement updating of all data of the data table of the KeyValue database. At the same time, since the full data update file and the full data deletion file are not imported KeyValue number one by one According to the library, the table data updating apparatus provided by the embodiment of the present invention has a faster update speed and better atomicity than the KeyValue data in the data table one by one update in the prior art.
  • the file generation module 302 can create a full data deletion file and then generate the full data update file. This ensures that when the data table of the KeyValue database is updated, there is always data in the data table that can be read by the user without interrupting the service. More preferably, the file generation module 302 may set the time at which the full data update file is generated to the update time stamp of each column of new attributes, and set the time at which the full data deletion file is generated to the deletion time stamp of each line of deleted data.
  • the import module 304 may first import the full data update file into the data table, and then import the full data deletion file into the data table.
  • the import module 304 is specifically configured to: change the save path of the full data update file to the directory of the data table, and change the save path of the full data deletion file to the data table. Under the directory. Among them, the operation of changing the directory takes only the second level. More preferably, the KeyValue database in the embodiment of the present invention may use a distributed file system as a medium for storing the underlying data, and the import module 304 is specifically configured to change the save path of the full data update file to the data table of the KeyValue database in the HDFS. In the directory, change the save path of the full data deletion file to the directory of the data table in the KeyValue database in HDFS.
  • the KeyValue database uses HDFS as the medium for the underlying data storage.
  • the user issues a full data update instruction through the client, and specifies the data in the update data table Table1.
  • the form of the data table Table1 is as shown in Table 1.
  • the data table Table1 includes 100 rows of old data, and each of the 100 rows of old data includes a RowKey and three columns of attributes (attribute 1, attribute 2, attribute 3), and each column attribute is set with an original timestamp T0. Among them, the data type of each column attribute is Put.
  • the receiving module 301 updates the instruction according to the full amount of data, and the obtaining module 303 acquires 100 lines of old data of the data table Table1.
  • the file generation module 302 generates a full data deletion file based on the 100 lines of old data.
  • the full data deletion file includes 100 rows of deleted data corresponding to 100 rows of old data of the data table table1, and the 100 rows of deleted data are in one-to-one correspondence with the RowKey of the 100 rows of old data, and the 100 rows of deleted data are used to delete the data. 100 lines of old data.
  • the 100-line deletion data of the full-size data deletion file may be in the form of FIG. 2(a).
  • the data type of each row of deleted data is Delete, and the deletion time stamp is set, and the time of deleting the time stamp of each row is the time T1 at which the full data deletion file is generated, and T1 is greater than T0.
  • the file generating module 302 acquires the data to be imported according to the full data update instruction, and generates a full data update file according to the data to be imported, as shown in Table 3.
  • the full data update file includes 200 rows of new data, each row of new data includes a row identifier RowKey and 3 columns of new attributes (attribute 1, attribute 2, attribute 4), and the data type of each column new attribute is input Put type, each The column new attribute is set with an update timestamp, and the time of the update timestamp of each column new attribute is the time T2 at which the full data update file is generated, and T2 is greater than T1.
  • the import module 304 changes the save path of the full data update file to the directory of the data table to import the full data update file into the data table.
  • the import module 304 changes the save path of the full data deletion file to the directory of the data table, so as to import the full data deletion file into the data table, so that the data of the full data update file is valid, but the old data in the data table is valid. All deleted.
  • T4 is greater than T3.
  • the table data updating device in the embodiment of the present invention is described above from the perspective of the unitized functional entity.
  • the table data updating device in the embodiment of the present invention is described below from the perspective of hardware processing. Referring to FIG. 4, the embodiment of the present invention is described.
  • Another embodiment of the table data updating apparatus 400 in the process includes:
  • the input device 401, the output device 402, the processor 403, and the memory 404 (wherein the number of processors 403 in the table data updating device 400 may be one or more, and one processor 403 is taken as an example in FIG. 4).
  • the input device 401, the output device 402, the processor 403, and the memory 404 may be connected by a bus or other means, wherein the bus connection is taken as an example in FIG.
  • the processor 403 is configured to perform the following steps by calling an operation instruction stored in the memory 404:
  • the data to be imported is obtained according to the full data update instruction, and a full data update file is generated according to the data to be imported.
  • the full data update file includes P rows of new data, and each row of new data includes a row identifier RowKey and Q column new attributes, each column.
  • the data type of the new attribute is the input Put type, and each column new attribute is set with an update timestamp;
  • the full data deletion file includes M rows of deleted data, and each row of deleted data corresponds to each row of old data one by one, and the data type of each row of deleted data is Delete type, and each row is deleted.
  • the data setting has a deletion timestamp, the deletion timestamp of the deleted data of the Rth row is greater than the maximum value of the original timestamp of the old data of the Rth row, and the deletion timestamp of the deleted data of the Rth row is smaller than the update timestamp of the new data of the Sth row.
  • a minimum value wherein the Rth row deletes the data, the Rth row old data and the Sth row new data have the same RowKey, and 1 ⁇ R ⁇ M, 1 ⁇ S ⁇ P;
  • the processor 403 becomes a full data deletion file, and then generates a full data update file, and sets a time when the full data update file is generated as an update time stamp of each column of new attributes, and generates a full data deletion file. The moment is set to the delete timestamp of the deleted data for each row.
  • the processor 403 first imports the full amount of data update file into the data table, and then imports the full amount of data deletion file into the data table. In some embodiments of the present invention, the processor 403 is further configured to perform the following steps:
  • the processor 403 is further configured to perform the following steps:
  • the embodiment of the present invention further provides another method for updating the data table of the KeyValue database.
  • the method specifically includes:
  • the steps 501 to 503 are basically the same as the steps 101 to 103, and are not described herein.
  • the table data update device After the table data update device generates the full data update file and the full data deletion file, the full data update file and the full data deletion file are combined into a table data update file.
  • the merged table data update file includes all data of the full data update file and all data of the full data deletion file.
  • the table data update device imports the table data update file into the data table corresponding to the full data update command. Since the table data update file is once imported into the data table, the atomicity of the data import can be ensured.
  • the merged table data update file includes all the data of the full data update file, and the minimum value of the update timestamp in the new data of the S-th row of the full-data update file is greater than the same R-row old data of the same RowKey.
  • the maximum value of the original timestamp so after the table data update file is imported into the data table, if there is old data of the Rth row, the new data of the Sth row can overwrite the old data of the Rth row, and the user can read the new row S row. Data, but not the old data of the Rth line.
  • the row R delete data can delete the same old R row data as its RowKey.
  • the full data deletion file includes the M line deletion data corresponding to the RowKey of the M line old data of the data table one by one, after the table data update file is imported into the data table, the M line old data in the data table is completely deleted. The user cannot read the old data in the data table.
  • deletion timestamp of the deleted data of the Rth row is smaller than the minimum value of the update timestamp in the new data of the Sth row which is the same as the RowKey, if the new data of the Sth row exists, the new data of the Sth row is still valid, that is, All data in the table data update file can still be read by the user.
  • the embodiment provides a method for updating a data table of a KeyValue database, comprising: receiving a full data update instruction, acquiring data to be imported according to the full data update instruction, and generating a full data update file according to the data to be imported; acquiring full data. Update the M line old data of the data table corresponding to the instruction, and generate a full data deletion file according to the M line old data; merge the full data update file and the full data deletion file into the table data update file; import the table data update file into the data table .
  • the full update of all the data of the data table of the KeyValue database is realized.
  • the method provided in this embodiment is faster than the KeyValue data in the data table in the prior art. Good sex.
  • the embodiment first merges the full data update file and the full data deletion file into a table data update file, and then imports the table data update file into the data table. This only needs to import the file once into the data table to achieve full update of the table data.
  • the table data device may not generate the full data deletion file and the full data update file, directly generate the table data update file according to the full data update instruction, and then import the table data update file. In the data sheet.
  • the table data updating means may change the save path of the table data update file to the directory of the data table to implement importing the table data update file into the data table.
  • the operation of changing the save path takes only the second level.
  • the embodiment of the present invention further provides a related table data updating device.
  • the basic structure of the device is as shown in FIG. 6 , which mainly includes:
  • the instruction receiving module 601 is configured to receive a full data update instruction
  • the generating file module 602 is configured to obtain data to be imported according to the full data update instruction, and generate a full data update file according to the data to be imported;
  • the data obtaining module 603 is configured to acquire M rows of old data of the data table corresponding to the full data update instruction;
  • the generating file module 602 is further configured to generate a full data deletion file according to the M line old data;
  • the modules 601 to 603 are substantially the same as the modules 301 to 303, and are not described herein.
  • the file merge module 604 is configured to merge the full data update file and the full data deletion file into a table data update file.
  • the file merge module 604 merges the full data update file and the full data deletion file into a table data update file.
  • the merged table data update file includes all data of the full data update file and all data of the full data deletion file.
  • the file importing module 605 is configured to import the table data update file into the data table.
  • the file import module 605 imports the table data update file into the data corresponding to the full data update instruction.
  • the table in which the table data update file is once imported into the data table, can guarantee the atomicity of the data import.
  • the merged table data update file includes all the data of the full data update file, and the minimum value of the update timestamp in the new data of the S-th row of the full-data update file is greater than the same R-row old data of the same RowKey.
  • the maximum value of the original timestamp so after the table data update file is imported into the data table, if there is old data of the Rth row, the new data of the Sth row can overwrite the old data of the Rth row, and the user can read the new row S row. Data, but not the old data of the Rth line.
  • the row R delete data can delete the same old R row data as its RowKey.
  • the full data deletion file includes the M line deletion data corresponding to the RowKey of the M line old data of the data table one by one, after the table data update file is imported into the data table, the M line old data in the data table is completely deleted. The user cannot read the old data in the data table.
  • deletion timestamp of the deleted data of the Rth row is smaller than the minimum value of the update timestamp in the new data of the Sth row which is the same as the RowKey, if the new data of the Sth row exists, the new data of the Sth row is still valid, that is, All data in the table data update file can still be read by the user.
  • the embodiment provides a table data updating device, wherein the command receiving module 601 receives the full data update command, and the generated file module 602 obtains the data to be imported according to the full data update command, and generates a full data update file according to the data to be imported.
  • the data obtaining module 603 acquires the M line old data of the data table corresponding to the full data update instruction, and the generated file module 602 generates the full data deletion file according to the M line old data; the file combining module 604 deletes the full amount data update file and the full data deletion file.
  • the merge is a table data update file; the file import module 605 imports the table data update file into the data table.
  • the file importing module 605 can change the save path of the table data update file to the directory of the data table to implement importing the table data update file into the data table.
  • change the save path The operation takes only seconds.
  • this embodiment first merges the full data update file and the full data deletion file into a table data update file, and then imports the table data update file into the data table. This only needs to import the file once into the data table to achieve full update of the table data.
  • the generated file module 602 may generate the table data update file directly according to the full data update instruction without generating the full data deletion file and the full data update file, and then the file import module 605 will The table data update file is imported into the data table.
  • the table data updating device in the embodiment of the present invention is described above from the perspective of the unitized functional entity.
  • the table data updating device in the embodiment of the present invention is described below from the perspective of hardware processing. Please refer to FIG. 4 again.
  • Another embodiment of the table data updating apparatus 400 in the example includes:
  • the input device 401, the output device 402, the processor 403, and the memory 404 (wherein the number of processors 403 in the table data updating device 400 may be one or more, and one processor 403 is taken as an example in FIG. 4).
  • the input device 401, the output device 402, the processor 403, and the memory 404 may be connected by a bus or other means, wherein the bus connection is taken as an example in FIG.
  • the processor 403 is configured to perform the following steps by calling an operation instruction stored in the memory 404:
  • the data to be imported is obtained according to the full data update instruction, and a full data update file is generated according to the data to be imported.
  • the full data update file includes P rows of new data, and each row of new data includes a row identifier RowKey and Q column new attributes, each column.
  • the data type of the new attribute is the input Put type, and each column new attribute is set with an update timestamp;
  • each row of old data includes a RowKey and N columns of old attributes, and each column of the old attribute is set with an original timestamp;
  • the full data deletion file includes M rows of deleted data, and each row of deleted data corresponds to each row of old data one by one, and the data type of each row of deleted data is Delete type, and each row is deleted.
  • the data setting has a deletion timestamp, the deletion timestamp of the deleted data of the Rth row is greater than the maximum value of the original timestamp of the old data of the Rth row, and the deletion timestamp of the deleted data of the Rth row is smaller than the update timestamp of the new data of the Sth row.
  • a minimum value wherein the Rth row deletes the data, the Rth row old data and the Sth row new data have the same RowKey, and 1 ⁇ R ⁇ M, 1 ⁇ S ⁇ P;
  • the processor 403 is further configured to perform the following steps:
  • the save path of the table data update file is changed to a directory of the data table corresponding to the full data update command.
  • the disclosed systems and methods can be implemented in other ways.
  • the system embodiment described above is merely illustrative.
  • the division of the unit is only a logical function division, and the actual implementation may have another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, module or unit, and may be electrical, mechanical or otherwise.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
  • the technical solution of the present invention which is essential or contributes to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium.
  • a number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the storage medium includes: a U disk, a removable hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明实施例提供了一种KeyValue数据库的数据表的更新方法,包括:接收全量数据更新指令,根据全量数据更新指令获取待导入的数据,并根据待导入的数据生成全量数据更新文件;获取全量数据更新指令对应的数据表的M行旧数据,并根据M行旧数据的M个RowKey,生成全量数据删除文件;将全量数据更新文件导入数据表;将全量数据删除文件导入数据表。本发明实施例提供的KeyValue数据库的数据表的更新方法能够实现KeyValue数据库的数据表的全量更新,且更新速度较快,原子性较好。本发明实施例还提供了相关的表数据更新装置。

Description

一种KeyValue数据库的数据表的更新方法与表数据更新装置
本申请要求于2014年11月12日提交国际专利局、申请号为PCT/CN2014/090934、发明名称为“一种KeyValue数据库的数据表的更新方法与表数据更新装置”的国际专利申请的优先权,其全部内容通过引用结合在本申请中
技术领域
本发明涉及信息技术领域,尤其涉及一种KeyValue数据库的数据表的更新方法与表数据更新装置。
背景技术
键值对(KeyValue)类型的数据库(以下简称KeyValue数据库)作为非关系(NoSQL)型的分布式存储的数据库,具有高可伸缩性和高可靠性,已经在越来越多的系统中得到广泛使用。很多KeyValue数据库中数据以表为单位进行存储,每个表中包括多行数据,每行数据用行标识(RowKey)来唯一标识,且每行数据包括多列属性,每列属性对应一条KeyValue数据,且每列属性具有数据类型和时间戳。其中,数据类型包括Put(新增)、Delete(删除)等类型,Put类型用于表示该属性是一个新增的属性,Delete类型用于表示该属性用于删除一个旧属性。时间戳用于表示每个属性生成的时间。KeyValue数据库中采用时间戳来实现数据的多版本保存,即,相同RowKey值的数据通过时间戳来进行进新旧数据的区分。当KeyValue数据库中包括有多版本的数据时,旧版本的数据会被新版本的数据所覆盖,用户在读取数据时会直接读取新版本的数据。
KeyValue数据库在更新数据时采用的是标记删除技术,用于逐条的删除或导入KeyValue数据。而实际应用中常常需要将对表数据进行全部数据的整体性的全量更新,即需要将表中现有数据全部清空并导入新数据,但是现有的KeyValue数据库不支持对表数据的一次性全量更新。而如果逐条对表数据进行删除与导入,则更新表中的全部数据的过程耗时很长,且更新过程缺乏原子性,会影响该表提供的数据读取服务的质量。因此,提供一种适用于KeyValue 数据库的表数据的全量更新方法,具有重要的意义。
发明内容
本发明实施例提供了一种KeyValue数据库的数据表的更新方法,可以实现KeyValue数据库中的表数据全量更新。
本发明实施例第一方面提供了一种KeyValue数据库的数据表的更新方法,包括:
接收全量数据更新指令;
根据所述全量数据更新指令获取待导入的数据,并根据所述待导入的数据生成全量数据更新文件,所述全量数据更新文件包括P行新数据,每行新数据包括一个行标识RowKey和Q列新属性,每列新属性的数据类型为输入Put类型,每列新属性设置有更新时间戳;
获取所述全量数据更新指令对应的数据表的M行旧数据,其中,每行旧数据包括一个RowKey和N列旧属性,每列旧属性设置有原始时间戳;
根据所述M行旧数据,生成全量数据删除文件,其中,所述全量数据删除文件包括M行删除数据,每行删除数据与所述每行旧数据一一对应,所述每行删除数据的数据类型为删除Delete类型,所述每行删除数据设置有删除时间戳,第R行删除数据的删除时间戳大于第R行旧数据的原始时间戳的最大值,第R行删除数据的删除时间戳小于第S行新数据的更新时间戳的最小值,所述第R行删除数据、所述第R行旧数据与所述第S行新数据具有相同的RowKey,且1≤R≤M,1≤S≤P;
将所述全量数据更新文件导入所述数据表;
将所述全量数据删除文件导入所述数据表。
结合本发明实施例的第一方面,本发明实施例的第一方面的第一种实现方式中,先执行所述生成全量数据删除文件的步骤,再执行所述生成全量数据更新文件的步骤;
所述每列新属性的更新时间戳的时刻为生成所述全量数据更新文件的时刻,所述每行删除数据的删除时间戳的时刻为生成所述全量数据删除文件的时刻。
结合本发明实施例的第一方面或第一方面的第一种实现方式,本发明实施例的第一方面的第二种实现方式中,先执行所述将所述全量数据更新文件导入所述数据表的步骤,再执行所述将所述全量数据删除文件导入所述数据表的步骤。
结合本发明实施例的第一方面、第一方面的第一种或第二种实现方式,本发明实施例的第一方面的第三种实现方式中,所述将所述全量数据更新文件导入所述数据表包括:
将所述全量数据更新文件的保存路径更改到所述全量数据更新指令对应的数据表的目录下。
结合本发明实施例的第一方面、第一方面的第一种、第二种或第三种实现方式,本发明实施例的第一方面的第四种实现方式中,所述将所述全量数据删除文件导入所述数据表包括:
将所述全量数据删除文件的保存路径更改到所述全量数据更新指令对应的数据表的目录下。
本发明实施例的第二方面提供了一种表数据更新装置,包括:
接收模块,用于接收全量数据更新指令;
文件生成模块,用于根据所述全量数据更新指令获取待导入的数据,并根据所述待导入的数据生成全量数据更新文件,所述全量数据更新文件包括P行新数据,每行新数据包括一个行标识RowKey和Q列新属性,每列新属性的数据类型为输入Put类型,每列新属性设置有更新时间戳;
获取模块,用于获取所述全量数据更新指令对应的数据表的M行旧数据,其中,每行旧数据包括一个RowKey和N列旧属性,每列旧属性设置有原始时间戳;
所述文件生成模块,还用于根据所述M行旧数据,生成全量数据删除文件,所述全量数据删除文件包括M行删除数据,每行删除数据与所述每行旧数据一一对应,所述每行删除数据的数据类型为Delete类型,所述每行删除数据设置有删除时间戳,第R行删除数据的删除时间戳大于第R行旧数据的原始时间戳的最大值,第R行删除数据的删除时间戳小于第S行新数据的更新时间戳的最小值,所述第R行删除数据、所述第R行旧数据与所述第S行新 数据具有相同的RowKey,且1≤R≤M,1≤S≤P;
导入模块,用于将所述全量数据更新文件导入所述数据表,以及将所述全量数据删除文件导入所述数据表。
结合本发明实施例的第二方面,本发明实施例的第二方面的第一种实现方式中,所述文件生成模块,先生成所述全量数据删除文件,再生成所述全量数据更新文件,将生成所述全量数据更新文件的时刻设置为所述每列新属性的更新时间戳,将生成所述全量数据删除文件的时刻设置为所述每行删除数据的删除时间戳。
结合本发明实施例的第二方面或第二方面的第一种实现方式,本发明实施例的第二方面的第二种实现方式中,所述导入模块,先将所述全量数据更新文件导入所述数据表,再将所述全量数据删除文件导入所述数据表。
结合本发明实施例的第二方面、第二方面的第一种或第二种实现方式,本发明实施例的第二方面的第三种实现方式中,所述第一导入模块具体用于:
将所述全量数据更新文件的保存路径更改到所述全量数据更新指令对应的数据表的目录下。
结合本发明实施例的第二方面、第二方面的第一种、第二种或第三种实现方式,本发明实施例的第二方面的第四种实现方式中,所述第二导入模块具体用于:
将所述全量数据删除文件的保存路径更改到所述全量数据更新指令对应的数据表的目录下。
本发明实施例的第三方面提供了一种表数据更新装置,包括输入装置、输出装置、处理器和存储器,其中,通过调用存储器存储的操作指令,所述处理器用于执行如下步骤:
接收全量数据更新指令;
根据所述全量数据更新指令获取待导入的数据,并根据所述待导入的数据生成全量数据更新文件,所述全量数据更新文件包括P行新数据,每行新数据包括一个行标识RowKey和Q列新属性,每列新属性的数据类型为输入Put类型,每列新属性设置有更新时间戳;
获取所述全量数据更新指令对应的数据表的M行旧数据,其中,每行旧 数据包括一个RowKey和N列旧属性,每列旧属性设置有原始时间戳;
根据所述M行旧数据,生成全量数据删除文件,其中,所述全量数据删除文件包括M行删除数据,每行删除数据与所述每行旧数据一一对应,所述每行删除数据的数据类型为Delete类型,所述每行删除数据设置有删除时间戳,第R行删除数据的删除时间戳大于第R行旧数据的原始时间戳的最大值,第R行删除数据的删除时间戳小于第S行新数据的更新时间戳的最小值,所述第R行删除数据、所述第R行旧数据与所述第S行新数据具有相同的RowKey,且1≤R≤M,1≤S≤P;
将所述全量数据更新文件导入所述数据表;
将所述全量数据删除文件导入所述数据表。
结合本发明实施例的第三方面,本发明实施例的第三方面的第一种实现方式中,所述处理器,先生成所述全量数据删除文件,再生成所述全量数据更新文件,将生成所述全量数据更新文件的时刻设置为所述每列新属性的更新时间戳,将生成所述全量数据删除文件的时刻设置为所述每行删除数据的删除时间戳。
结合本发明实施例的第三方面或第三方面的第一种实现方式,本发明实施例的第三方面的第二种实现方式中,所述处理器,先将所述全量数据更新文件导入所述数据表,再将所述全量数据删除文件导入所述数据表。
结合本发明实施例的第三方面、第三方面的第一种或第二种实现方式,本发明实施例的第三方面的第三种实现方式中,所述处理器还用于执行:
将所述全量数据更新文件的保存路径更改到所述全量数据更新指令对应的数据表的目录下。
结合本发明实施例的第三方面、第三方面的第一种、第二种或第三种实现方式,本发明实施例的第三方面的第四种实现方式中,所述处理器还用于执行:
将所述全量数据删除文件的保存路径更改到所述全量数据更新指令对应的数据表的目录下。
本发明实施例第四方面提供了一种KeyValue数据库的数据表的更新方法,包括:
接收全量数据更新指令;
根据所述全量数据更新指令获取待导入的数据,并根据所述待导入的数据生成全量数据更新文件,所述全量数据更新文件包括P行新数据,每行新数据包括一个行标识RowKey和Q列新属性,每列新属性的数据类型为输入Put类型,每列新属性设置有更新时间戳;
获取所述全量数据更新指令对应的数据表的M行旧数据,其中,每行旧数据包括一个RowKey和N列旧属性,每列旧属性设置有原始时间戳;
根据所述M行旧数据,生成全量数据删除文件,其中,所述全量数据删除文件包括M行删除数据,每行删除数据与所述每行旧数据一一对应,所述每行删除数据的数据类型为删除Delete类型,所述每行删除数据设置有删除时间戳,第R行删除数据的删除时间戳大于第R行旧数据的原始时间戳的最大值,第R行删除数据的删除时间戳小于第S行新数据的更新时间戳的最小值,所述第R行删除数据、所述第R行旧数据与所述第S行新数据具有相同的RowKey,且1≤R≤M,1≤S≤P;
将所述全量数据更新文件与所述全量数据删除文件合并为表数据更新文件;
将所述表数据更新文件导入所述数据表。
结合本发明实施例的第四方面,本发明实施例的第四方面的第一种实现方式中,所述将所述表数据更新文件导入所述数据表包括:
将所述表数据更新文件的保存路径更改到所述全量数据更新指令对应的数据表的目录下。
本发明实施例的第五方面提供了一种表数据更新装置,包括:
指令接收模块,用于接收全量数据更新指令;
生成文件模块,用于根据所述全量数据更新指令获取待导入的数据,并根据所述待导入的数据生成全量数据更新文件,所述全量数据更新文件包括P行新数据,每行新数据包括一个行标识RowKey和Q列新属性,每列新属性的数据类型为输入Put类型,每列新属性设置有更新时间戳;
数据获取模块,用于获取所述全量数据更新指令对应的数据表的M行旧数据,其中,每行旧数据包括一个RowKey和N列旧属性,每列旧属性设置有原始时间戳;
所述生成文件模块,还用于根据所述M行旧数据,生成全量数据删除文件,所述全量数据删除文件包括M行删除数据,每行删除数据与所述每行旧数据一一对应,所述每行删除数据的数据类型为Delete类型,所述每行删除数据设置有删除时间戳,第R行删除数据的删除时间戳大于第R行旧数据的原始时间戳的最大值,第R行删除数据的删除时间戳小于第S行新数据的更新时间戳的最小值,所述第R行删除数据、所述第R行旧数据与所述第S行新数据具有相同的RowKey,且1≤R≤M,1≤S≤P;
文件合并模块,用于将所述全量数据更新文件与所述全量数据删除文件合并为表数据更新文件
文件导入模块,用于将所述表数据更新文件导入所述数据表。
结合本发明实施例的第五方面,本发明实施例的第五方面的第一种实现方式中,所述文件导入模块具体用于:
将所述表数据更新文件的保存路径更改到所述全量数据更新指令对应的数据表的目录下。
本发明实施例的第六方面提供了一种表数据更新装置,包括输入装置、输出装置、处理器和存储器,其中,通过调用存储器存储的操作指令,所述处理器用于执行如下步骤:
接收全量数据更新指令;
根据所述全量数据更新指令获取待导入的数据,并根据所述待导入的数据生成全量数据更新文件,所述全量数据更新文件包括P行新数据,每行新数据包括一个行标识RowKey和Q列新属性,每列新属性的数据类型为输入Put类型,每列新属性设置有更新时间戳;
获取所述全量数据更新指令对应的数据表的M行旧数据,其中,每行旧数据包括一个RowKey和N列旧属性,每列旧属性设置有原始时间戳;
根据所述M行旧数据,生成全量数据删除文件,其中,所述全量数据删除文件包括M行删除数据,每行删除数据与所述每行旧数据一一对应,所述每行删除数据的数据类型为Delete类型,所述每行删除数据设置有删除时间戳,第R行删除数据的删除时间戳大于第R行旧数据的原始时间戳的最大值,第R行删除数据的删除时间戳小于第S行新数据的更新时间戳的最小值,所 述第R行删除数据、所述第R行旧数据与所述第S行新数据具有相同的RowKey,且1≤R≤M,1≤S≤P;
将所述全量数据更新文件与所述全量数据删除文件合并为表数据更新文件;
将所述表数据更新文件导入所述数据表。
结合本发明实施例的第六方面,本发明实施例的第六方面的第一种实现方式中,所述处理器还用于执行:
将所述表数据更新文件的保存路径更改到所述全量数据更新指令对应的数据表的目录下。
本发明实施例提供了一种KeyValue数据库的数据表的更新方法,包括:接收全量数据更新指令,根据全量数据更新指令获取待导入的数据,并根据待导入的数据生成全量数据更新文件;获取全量数据更新指令对应的数据表的M行旧数据,并根据M行旧数据生成全量数据删除文件;将全量数据更新文件导入数据表;将全量数据删除文件导入数据表。通过这样的方法,实现了KeyValue数据库的数据表的全部数据的全量更新。同时,由于全量数据更新文件与全量数据删除文件不是逐条的导入KeyValue数据库中,因此与现有技术中逐条更新数据表中的KeyValue数据相比,本发明实施例提供的方法更新速度较快,且原子性较好。
附图说明
图1为本发明实施例中KeyValue数据库的数据表的更新方法一个实施例流程图;
图2为本发明实施例中KeyValue数据库的数据表的更新方法另一个实施例流程图;
图3为本发明实施例中表数据更新装置一个实施例结构图;
图4为本发明实施例中表数据更新装置另一个实施例结构图;
图5为本发明实施例中KeyValue数据库的数据表的更新方法另一个实施例流程图;
图6为本发明实施例中表数据更新装置另一个实施例结构图。
具体实施方式
本发明实施例提供了一种KeyValue数据库的数据表的更新方法,可以实现KeyValue数据库中的表数据全量更新。本发明还提出了相关的表数据更新装置,以下将分别进行说明。
本发明实施例提供的KeyValue数据库的数据表的更新方法的基本流程请参阅图1,主要包括:
101、接收全量数据更新指令;
用户通过客户端下发全量数据更新指令,该全量数据更新指令用于指示全量更新KeyValue数据库中指定的数据表的表数据。表数据更新装置接收该全量数据更新指令。
102、根据全量数据更新指令获取待导入的数据,并根据待导入的数据生成全量数据更新文件;
表数据更新装置根据该全量数据更新指令获取待导入的数据,并根据待导入的数据生成全量数据更新文件。优选的,全量数据更新指令中可以包括该数据表的表名,并指定待导入的数据的保存路径与生成的全量数据更新文件的保存路径。表数据更新装置从全量数据更新指令所指定的待导入的数据的保存路径获取待导入的数据,根据待导入的数据生成全量数据更新文件,然后保存在全量数据更新指令所指定的全量数据更新文件的保存路径下。
其中,全量数据更新文件包括P行新数据,每行新数据包括一个行标识RowKey和Q列新属性,每列新属性的数据类型为输入Put类型,每列新属性设置有更新时间戳。
103、获取全量数据更新指令对应的数据表的M行旧数据,并根据M行旧数据,生成全量数据删除文件;
表数据更新装置获取全量数据更新指令对应的数据表的M行旧数据,其中,每行旧数据包括一个RowKey和N列旧属性,每列旧属性设置有原始时间戳。表数据更新装置根据M行旧数据,生成全量数据删除文件。
优选的,全量数据更新指令中可以包括该数据表的表名,并指定生成的全量数据删除文件的保存路径。表数据更新装置根据数据表的表名确定数据表,然后保存在全量数据更新指令所指定的全量数据删除文件的保存路径下。
其中,全量数据删除文件包括M行删除数据,每行删除数据与该数据表的每行旧数据一一对应,每行删除数据的数据类型为Delete类型,每行删除数据用于删除与其RowKey相同的旧数据。其中,每行删除数据设置有删除时间戳。
其中,步骤103也可以位于步骤102之前,本实施例中不做限定。但无论步骤103与步骤102的顺序如何,都需要保证第R行删除数据的删除时间戳大于第R行旧数据中的原始时间戳的最大值,第R行删除数据的删除时间戳小于第S行新数据中的更新时间戳的最小值。其中,第R行删除数据、第R行旧数据与第S行新数据具有相同的RowKey,且1≤R≤M,1≤S≤P。
104、将全量数据更新文件导入数据表;
表数据更新装置将全量数据更新文件导入全量数据更新指令对应的数据表,其中,全量数据更新文件被一次性的导入数据表中,能够保证数据导入的原子性。
其中,由于全量数据更新文件的第S行新数据中更新时间戳的最小值,大于与其RowKey相同的第R行旧数据中的原始时间戳的最大值,因此在全量数据更新文件导入数据表中后,若存在第R行旧数据,则第S行新数据能够覆盖第R行旧数据,用户能够读取第S行新数据,但不能够读取第R行旧数据。
105、将全量数据删除文件导入数据表。
表数据更新装置将全量数据删除文件导入全量数据更新指令对应的数据表,其中,全量数据删除文件被一次性的导入数据表中,能够保证数据删除的原子性。
其中,由于第R行删除数据的删除时间戳大于与其RowKey相同的第R行旧数据中的原始时间戳的最大值,因此第R行删除数据能够删除与其RowKey相同的第R行旧数据。且因为全量数据删除文件包括与数据表的M行旧数据的RowKey一一对应的M行删除数据,因此在全量数据删除文件导入到数据表中后,数据表中的M行旧数据被全部删除,用户不能读取数据表中的旧数据。其中,由于第R行删除数据的删除时间戳小于与其RowKey相同的第S行新数据中的更新时间戳的最小值,因此若存在第S行新数据,则第 S行新数据仍然有效,即全量数据删除文件导入数据表中后,全量数据更新文件依然能够被用户正常读取。
其中,步骤105也可以位于步骤104之前,本实施例中不做限定。
本实施例提供了一种KeyValue数据库的数据表的更新方法,包括:接收全量数据更新指令,根据全量数据更新指令获取待导入的数据,并根据待导入的数据生成全量数据更新文件;获取全量数据更新指令对应的数据表的M行旧数据,并根据M行旧数据,生成全量数据删除文件;将全量数据更新文件导入数据表;将全量数据删除文件导入数据表。通过这样的方法,实现了KeyValue数据库的数据表的全部数据的全量更新。同时,由于全量数据更新文件与全量数据删除文件不是逐条的导入KeyValue数据库中,因此与现有技术中逐条更新数据表中的KeyValue数据相比,本实施例提供的方法更新速度较快,且原子性较好。
图1所示的实施例中,步骤102、103的先后顺序没有做限定,步骤104、105的顺序没有做限定。但是在实际应用中,改变这些步骤的顺序,可以产生进一步的有益效果,请参阅图2,本发明实施例提供的另一种KeyValue数据库的数据表的更新方法的流程包括:
201、接收全量数据更新指令;
用户通过客户端下发全量数据更新指令,该全量数据更新指令用于指示全量更新KeyValue数据库中数据表的表数据。表数据更新装置接收该全量数据更新指令。
具体的,全量数据更新指令用于指示删除KeyValue数据库中数据表的旧数据并向该数据表中导入新数据,因此优选的,该全量数据更新指令中可以包括删除指令与更新指令。删除指令用于指示删除KeyValue数据库中数据表的旧数据,更新指令用于指示向该数据表中导入新数据。
202、获取全量数据更新指令对应的数据表的M行旧数据,并根据M行旧数据,生成全量数据删除文件;
表数据更新装置根据全量数据更新指令获取全量数据更新指令对应的数据表的M行旧数据,其中,每行旧数据包括一个RowKey和N列旧属性,每列旧属性设置有原始时间戳。表数据更新装置根据M行旧数据,生成全量数 据删除文件。
优选的,全量数据更新指令中可以包括该数据表的表名,并指定生成的全量数据删除文件的保存路径。表数据更新装置根据数据表的表名确定数据表,然后保存在全量数据更新指令所指定的全量数据删除文件的保存路径下。
其中,全量数据删除文件包括M行删除数据,每行删除数据与该数据表的每行旧数据一一对应,每行删除数据的数据类型为Delete类型,每行删除数据用于删除与其RowKey相同的旧数据。其中,每行删除数据设置有删除时间戳,且每行删除数据的删除时间戳的时刻为生成该全量数据删除文件的时刻,该时间戳大于原始时间戳的最大值。
其中,若全量数据更新指令中包括删除指令与更新指令,则表数据更新装置根据该删除指令执行本步骤中的操作。
203、根据全量数据更新指令获取待导入的数据,并根据待导入的数据生成全量数据更新文件;
表数据更新装置在生成了全量数据删除文件后,根据该全量数据更新指令获取待导入的数据,并根据待导入的数据生成全量数据更新文件。优选的,全量数据更新指令中可以包括该数据表的表名,并指定待导入的数据的保存路径与生成的全量数据更新文件的保存路径。表数据更新装置从全量数据更新指令所指定的待导入的数据的保存路径获取待导入的数据,根据待导入的数据生成全量数据更新文件,然后保存在全量数据更新指令所指定的全量数据更新文件的保存路径下。
其中,全量数据更新文件包括P行新数据,每行新数据包括一个行标识RowKey和Q列新属性,每列新属性的数据类型为输入Put类型,每列新属性设置有更新时间戳,且每列新属性的更新时间戳的时刻为生成该全量数据更新文件的时刻,该更新时间戳大于原始时间戳的最大值。
其中,若全量数据更新指令中包括删除指令与更新指令,则表数据更新装置根据该更新指令执行本步骤中的操作。
204、将全量数据更新文件导入数据表;
表数据更新装置将全量数据更新文件导入全量数据更新指令对应的数据表,其中,全量数据更新文件被一次性的导入数据表中,能够保证数据导入的 原子性。
优选的,表数据更新装置可以将全量数据更新文件的保存路径更改到该数据表的目录下,以实现将全量数据更新文件导入数据表。其中,更改保存路径的操作耗时仅为秒级别。
其中,由于每列新属性的更新时间戳的时刻为生成该全量数据更新文件的时刻,该更新时间戳大于原始时间戳的最大值,因此在全量数据更新文件导入数据表中后,第S行新数据能够覆盖第R行旧数据,用户能够读取第S行新数据,但不能够读取第R行旧数据。其中,第R行旧数据与第S行新数据具有相同的RowKey,且1≤R≤M,1≤S≤P。
205、将全量数据删除文件导入数据表。
表数据更新装置将全量数据删除文件导入全量数据更新指令对应的数据表,其中,全量数据删除文件被一次性的导入数据表中,能够保证数据删除的原子性。
优选的,表数据更新装置可以将全量数据删除文件的保存路径更改到该数据表的目录下,以实现将全量数据删除文件导入数据表。其中,更改保存路径的操作耗时仅为秒级别。
其中,由于每行删除数据的删除时间戳的时刻为生成该全量数据删除文件的时刻,该删除时间戳大于原始时间戳的最大值,因此在全量数据更新文件导入数据表中后,第R行删除数据能够删除与其RowKey相同的第R行旧数据。且因为全量数据删除文件包括与数据表的M行旧数据的RowKey一一对应的M行删除数据,因此在全量数据删除文件导入到数据表中后,数据表中的M行旧数据被全部删除,用户不能读取数据表中的旧数据。其中,由于每列新属性的更新时间戳的时刻为步骤203中生成全量数据更新文件的时刻,每行删除数据的删除时间戳的时刻为步骤202中生成全量数据删除文件的时刻,且步骤202先于步骤203执行,因此每个删除时间戳小于每个更新时间戳,全量数据删除文件不会删除全量数据更新文件中的数据,全量数据更新文件依然能够被用户正常读取。
其中,步骤204、205的顺序本实施例中不做限定。但是,若先执行步骤205,则在全量数据删除文件导入数据表之后,全量数据更新文件导入数据表 之前,数据表中没有数据可以被用户读取,KeyValue数据库对用户提供的服务暂时处于中断状态。因此优选的,先执行步骤204,后执行步骤205,这样可以保证在进行KeyValue数据库的数据表的更新时,数据表中一直有数据可以被用户读取,无需中断服务。
优选的,本发明实施例中的KeyValue数据库可以采用分布式文件系统作为底层数据存储的介质。分布式文件系统是一种文件管理系统,该系统中的物理存储资源不一定连接在本地节点上,而是通过计算机网络与多个节点相连。分布式文件系统中,大的数据块被分为多个小的数据块存储在多个节点上,使得分布式文件系统有着较高的容错性与吞吐量。常用的分布式文件系统有很多,包括Hadoop分布式文件系统(HDFS,Hadoop Distributed File System)等,本发明实施例中不做限定,仅以HDFS举例说明。表数据更新装置在生成了全量数据更新文件与全量数据删除文件后,将全量数据更新文件与全量数据删除文件保存在HDFS中KeyValue数据库的目录之外的位置。在步骤204中,将全量数据更新文件的保存路径更改到HDFS中KeyValue数据库中该数据表的目录下;在步骤205中,将全量数据删除文件的保存路径更改到HDFS中KeyValue数据库中该数据表的目录下。
本实施例提供了一种KeyValue数据库的数据表的更新方法,包括:接收全量数据更新指令;根据待导入的数据生成全量数据更新文件;并根据全量数据更新指令获取待导入的数据;获取全量数据更新指令对应的数据表的M行旧数据,并根据M行旧数据,生成全量数据删除文件;将全量数据更新文件导入数据表;将全量数据删除文件导入数据表。通过这样的方法,实现了KeyValue数据库的数据表的全部数据的更新。且由于全量数据更新文件与全量数据删除文件不是逐条的导入KeyValue数据库中,且导入时间仅为秒级别,因此与现有技术中逐条更新数据表中的KeyValue数据相比,本发明实施例提供的方法原子性较好,且更新速度较快。又由于本发明实施例提供的方法先导入全量数据更新文件,后导入全量数据删除文件,因此目标表中的数据不会出现旧数据被删除了但数据还没有被导入的情况,因此无需中断数据读取服务,用户体验较好。
为了便于理解上述实施例,下面将以上述实施例的一个具体应用场景为例 进行说明。
KeyValue数据库采用HDFS作为底层数据存储的介质。用户通过客户端下发全量数据更新指令,指定更新数据表Table1中的数据。
其中,数据表Table1的形式如表1所示。其中,表1仅仅表示数据表Table1的一种逻辑结构,其底层文件为KeyValue的形式,每一行的每个属性对应一个KeyValue,每个KeyValue包括RowKey值、数据类型与原始时间戳,还可以包括其他信息。数据表Table1中包括100行旧数据,该100行旧数据中,每行旧数据包括一个RowKey和3列属性(属性1、属性2、属性3),每列属性设置有原始时间戳T0。其中,每列属性的数据类型为Put。
Figure PCTCN2015073211-appb-000001
表1
T1时刻,表数据更新装置根据该全量数据更新指令,获取Table1的100行旧数据。表数据更新装置根据该100行旧数据,生成全量数据删除文件。该全量数据删除文件包括与数据表Table1的100行旧数据一一对应的100行删除数据,该100行删除数据与该100行旧数据的RowKey一一对应,该100行删除数据用于删除该100行旧数据。
其中,全量数据删除文件的100行删除数据的形式可以为表2(a)所示。其中,表2(a)仅仅表示全量数据删除文件的一种逻辑结构,其底层文件为KeyValue的形式,每一行对应一个KeyValue,每个KeyValue包括RowKey值、数据类型与删除时间戳,还可以包括其他信息。每行删除数据的数据类型为Delete,且设置有删除时间戳,且每行删除数据的删除时间戳的时刻为生成该 全量数据删除文件的时刻T1,T1大于T0。
RowKey 数据类型 删除时间戳
1 Delete T1
2 Delete T1
3 Delete T1
…… …… ……
99 Delete T1
100 Delete T1
表2(a)
特别的,全量数据删除文件的100行删除数据的形式也可以为表2(b)所示。其中,表2(b)仅仅表示全量数据删除文件的一种逻辑结构,其底层文件为KeyValue的形式,每一行的每个属性对应一个KeyValue,每个KeyValue包括RowKey值、数据类型与删除时间戳,还可以包括其他信息。每行删除数据具有与Table1对应的三个属性,每个属性的数据类型为Delete,且设置有删除时间戳,且每个属性的删除时间戳的时刻为生成该全量数据删除文件的时刻T1,T1大于T0。本例中仅以表2(a)所示的全量数据删除文件为例进行说明。
Figure PCTCN2015073211-appb-000002
表2(b)
T2时刻,表数据更新装置根据该全量数据更新指令,获取待导入的数据,并根据待导入的数据生成全量数据更新文件,请参阅表3。其中,表3仅仅表示全量数据更新文件的一种逻辑结构,其底层文件为KeyValue的形式,每一行的每个属性对应一个KeyValue,每个KeyValue包括RowKey值、数据类型与更新时间戳,还可以包括其他信息。其中,全量数据更新文件包括200行新数据,每行新数据包括一个行标识RowKey和3列新属性(属性1、属性2、属性4),每列新属性的数据类型为输入Put类型,每列新属性设置有更新时间戳,且每列新属性的更新时间戳的时刻为生成该全量数据更新文件的时刻T2,T2大于T1。
Figure PCTCN2015073211-appb-000003
表3
T3时刻,表数据更新装置将全量数据更新文件的保存路径更改到该数据表Table1的目录下,以实现将全量数据更新文件导入数据表。
T4时刻,表数据更新装置将全量数据删除文件的保存路径更改到该数据表Table1的目录下,以实现将全量数据删除文件导入数据表,使得全量数据更新文件的数据有效,但数据表中的旧数据被全部删除。其中T4大于T3。
本发明实施例还提供了相关的表数据更新装置,其基本结构请参阅图3,主要包括:
接收模块301,用于接收全量数据更新指令;
用户通过客户端下发全量数据更新指令,该全量数据更新指令用于指示全 量更新KeyValue数据库中数据表的表数据。接收模块301接收该全量数据更新指令。
文件生成模块302,用于根据全量数据更新指令获取待导入的数据,并根据待导入的数据生成全量数据更新文件;
文件生成模块302根据该全量数据更新指令获取待导入的数据,并根据待导入的数据生成全量数据更新文件。优选的,全量数据更新指令中可以包括该数据表的表名,并指定待导入的数据的保存路径与生成的全量数据更新文件的保存路径。文件生成模块302从全量数据更新指令所指定的待导入的数据的保存路径获取待导入的数据,根据待导入的数据生成全量数据更新文件,然后保存在全量数据更新指令所指定的全量数据更新文件的保存路径下。
其中,全量数据更新文件包括P行新数据,每行新数据包括一个行标识RowKey和Q列新属性,每列新属性的数据类型为输入Put类型,每列新属性设置有更新时间戳。
获取模块303,用于获取全量数据更新指令对应的数据表的M行旧数据;
其中,每行旧数据包括一个RowKey和N列旧属性,每列旧属性设置有原始时间戳。
文件生成模块302,还用于根据M行旧数据,生成全量数据删除文件;
优选的,全量数据更新指令中可以包括该数据表的表名,并指定生成的全量数据删除文件的保存路径。文件生成模块302根据数据表的表名确定数据表,然后保存在全量数据更新指令所指定的全量数据删除文件的保存路径下。
其中,全量数据删除文件包括M行删除数据,每行删除数据与该数据表的每行旧数据一一对应,每行删除数据的数据类型为Delete类型,每行删除数据用于删除与其RowKey相同的旧数据。其中,每行删除数据设置有删除时间戳。
其中,第R行删除数据的删除时间戳大于第R行旧数据中的原始时间戳的最大值,第R行删除数据的删除时间戳小于第S行新数据中的更新时间戳的最小值。其中,第R行删除数据、第R行旧数据与第S行新数据具有相同的RowKey,且1≤R≤M,1≤S≤P。
导入模块304,用于将全量数据更新文件导入数据表,以及将全量数据删 除文件导入数据表;
导入模块304将全量数据更新文件导入全量数据更新指令对应的数据表,其中,全量数据更新文件被一次性的导入数据表中,能够保证数据导入的原子性。
其中,由于全量数据更新文件的第S行新数据中更新时间戳的最小值,大于与其RowKey相同的第R行旧数据中的原始时间戳的最大值,因此在全量数据更新文件导入数据表中后,若存在第R行旧数据,则第S行新数据能够覆盖第R行旧数据,用户能够读取第S行新数据,但不能够读取第R行旧数据。
导入模块304将全量数据删除文件导入全量数据更新指令对应的数据表,其中,全量数据删除文件被一次性的导入数据表中,能够保证数据删除的原子性。
其中,由于第R行删除数据的删除时间戳大于与其RowKey相同的第R行旧数据中的原始时间戳的最大值,因此第R行删除数据能够删除与其RowKey相同的第R行旧数据。且因为全量数据删除文件包括与数据表的M行旧数据的RowKey一一对应的M行删除数据,因此在全量数据删除文件导入到数据表中后,数据表中的M行旧数据被全部删除,用户不能读取数据表中的旧数据。其中,由于第R行删除数据的删除时间戳小于与其RowKey相同的第S行新数据中的更新时间戳的最小值,因此若存在第S行新数据,则第S行新数据仍然有效,即全量数据删除文件导入数据表中后,全量数据更新文件依然能够被用户正常读取。
本实施例提供了一种表数据更新装置,包括:接收模块301,用于接收全量数据更新指令;文件生成模块302,用于根据全量数据更新指令获取待导入的数据,并根据待导入的数据生成全量数据更新文件;获取模块303,用于获取全量数据更新指令对应的数据表的M行旧数据;文件生成模块302,还用于根据M行旧数据,生成全量数据删除文件;导入模块304,用于将全量数据更新文件导入数据表,以及将全量数据删除文件导入数据表。本实施例提供的表数据更新装置能够实现KeyValue数据库的数据表的全部数据的更新。同时,由于全量数据更新文件与全量数据删除文件不是逐条的导入KeyValue数 据库中,因此与现有技术中逐条更新数据表中的KeyValue数据相比,本发明实施例提供的表数据更新装置的更新速度较快,且原子性较好。
优选的,图3所示的实施例中,文件生成模块302可以先生成全量数据删除文件,再生成所述全量数据更新文件。这样可以保证在进行KeyValue数据库的数据表的更新时,数据表中一直有数据可以被用户读取,无需中断服务。更为优选的,文件生成模块302可以将生成全量数据更新文件的时刻设置为每列新属性的更新时间戳,将生成全量数据删除文件的时刻设置为每行删除数据的删除时间戳。优选的,图3所示的实施例中,导入模块304可以先将所述全量数据更新文件导入数据表,再将全量数据删除文件导入数据表。
优选的,图3所示的实施例中,导入模块304具体用于:将全量数据更新文件的保存路径更改到该数据表的目录下,以及将全量数据删除文件的保存路径更改到该数据表的目录下。其中,更改目录的操作耗时仅为秒级别。更为优选的,本发明实施例中的KeyValue数据库可以采用分布式文件系统作为底层数据存储的介质,导入模块304具体用于将全量数据更新文件的保存路径更改到HDFS中KeyValue数据库的该数据表的目录下,将全量数据删除文件的保存路径更改到HDFS中KeyValue数据库的该数据表的目录下。
为了便于理解上述实施例,下面将以上述实施例的一个具体应用场景为例进行说明。
KeyValue数据库采用HDFS作为底层数据存储的介质。用户通过客户端下发全量数据更新指令,指定更新数据表Table1中的数据。
其中,数据表Table1的形式如表1所示。数据表Table1中包括100行旧数据,该100行旧数据中,每行旧数据包括一个RowKey和3列属性(属性1、属性2、属性3),每列属性设置有原始时间戳T0。其中,每列属性的数据类型为Put。
T1时刻,接收模块301根据该全量数据更新指令,获取模块303获取数据表Table1的100行旧数据。文件生成模块302根据该100行旧数据,生成全量数据删除文件。该全量数据删除文件包括与数据表table1的100行旧数据一一对应的100行删除数据,该100行删除数据与该100行旧数据的RowKey一一对应,该100行删除数据用于删除该100行旧数据。
其中,全量数据删除文件的100行删除数据的形式可以为图2(a)所示。每行删除数据的数据类型为Delete,且设置有删除时间戳,且每行删除数据的删除时间戳的时刻为生成该全量数据删除文件的时刻T1,T1大于T0。
T2时刻,文件生成模块302装置根据该全量数据更新指令,获取待导入的数据,并根据待导入的数据生成全量数据更新文件,请参阅表3。其中,全量数据更新文件包括200行新数据,每行新数据包括一个行标识RowKey和3列新属性(属性1、属性2、属性4),每列新属性的数据类型为输入Put类型,每列新属性设置有更新时间戳,且每列新属性的更新时间戳的时刻为生成该全量数据更新文件的时刻T2,T2大于T1。
T3时刻,导入模块304将全量数据更新文件的保存路径更改到该数据表的目录下,以实现将全量数据更新文件导入数据表。
T4时刻,导入模块304将全量数据删除文件的保存路径更改到该数据表的目录下,以实现将全量数据删除文件导入数据表,使得全量数据更新文件的数据有效,但数据表中的旧数据被全部删除。其中T4大于T3。
上面从单元化功能实体的角度对本发明实施例中的表数据更新装置进行了描述,下面从硬件处理的角度对本发明实施例中的表数据更新装置进行描述,请参阅图4,本发明实施例中的表数据更新装置400另一实施例包括:
输入装置401、输出装置402、处理器403和存储器404(其中表数据更新装置400中的处理器403的数量可以一个或多个,图4中以一个处理器403为例)。在本发明的一些实施例中,输入装置401、输出装置402、处理器403和存储器404可通过总线或其它方式连接,其中,图4中以通过总线连接为例。
其中,通过调用存储器404存储的操作指令,处理器403用于执行如下步骤:
接收全量数据更新指令;
根据全量数据更新指令获取待导入的数据,并根据待导入的数据生成全量数据更新文件,全量数据更新文件包括P行新数据,每行新数据包括一个行标识RowKey和Q列新属性,每列新属性的数据类型为输入Put类型,每列新属性设置有更新时间戳;
获取全量数据更新指令对应的数据表的M行旧数据,其中,每行旧数据 包括一个RowKey和N列旧属性,每列旧属性设置有原始时间戳;
根据M行旧数据,生成全量数据删除文件,该全量数据删除文件包括M行删除数据,每行删除数据与每行旧数据一一对应,每行删除数据的数据类型为Delete类型,每行删除数据设置有删除时间戳,第R行删除数据的删除时间戳大于第R行旧数据的原始时间戳的最大值,第R行删除数据的删除时间戳小于第S行新数据的更新时间戳的最小值,其中,第R行删除数据、第R行旧数据与第S行新数据具有相同的RowKey,且1≤R≤M,1≤S≤P;
将全量数据更新文件导入数据表;
将全量数据删除文件导入数据表。
本发明的一些实施例中,处理器403先生成全量数据删除文件,再生成全量数据更新文件,将生成全量数据更新文件的时刻设置为每列新属性的更新时间戳,将生成全量数据删除文件的时刻设置为每行删除数据的删除时间戳。
本发明的一些实施例中,处理器403先将全量数据更新文件导入数据表,再将全量数据删除文件导入数据表。本发明的一些实施例中,处理器403还用于执行如下步骤:
将全量数据更新文件的保存路径更改到全量数据更新指令对应的数据表的目录下。
本发明的一些实施例中,处理器403还用于执行如下步骤:
将全量数据删除文件的保存路径更改到全量数据更新指令对应的数据表的目录下。
本发明实施例还提供了另一种KeyValue数据库的数据表的更新方法,请参阅图5,具体包括:
501、接收全量数据更新指令;
502、根据全量数据更新指令获取待导入的数据,并根据待导入的数据生成全量数据更新文件;
503、获取全量数据更新指令对应的数据表的M行旧数据,并根据M行旧数据,生成全量数据删除文件;
其中,步骤501至503与步骤101至103基本相同,此处不做赘述。
504、将全量数据更新文件与全量数据删除文件合并为表数据更新文件;
表数据更新装置生成了全量数据更新文件与全量数据删除文件后,将全量数据更新文件与全量数据删除文件合并为表数据更新文件。合并后的表数据更新文件中,既包括全量数据更新文件的所有数据,也包括全量数据删除文件的所有数据。
505、将表数据更新文件导入数据表。
表数据更新装置将表数据更新文件导入全量数据更新指令对应的数据表,其中,由于表数据更新文件被一次性的导入数据表中,所以能够保证数据导入的原子性。
其中,由于合并后的表数据更新文件中包括全量数据更新文件的所有数据,而全量数据更新文件的第S行新数据中更新时间戳的最小值,大于与其RowKey相同的第R行旧数据中的原始时间戳的最大值,因此在表数据更新文件导入数据表中后,若存在第R行旧数据,则第S行新数据能够覆盖第R行旧数据,用户能够读取第S行新数据,但不能够读取第R行旧数据。
其中,由于合并后的表数据更新文件中包括全量数据删除文件的所有数据,而第R行删除数据的删除时间戳大于与其RowKey相同的第R行旧数据中的原始时间戳的最大值,因此在表数据更新文件导入数据表中后,第R行删除数据能够删除与其RowKey相同的第R行旧数据。且因为全量数据删除文件包括与数据表的M行旧数据的RowKey一一对应的M行删除数据,因此在表数据更新文件导入到数据表中后,数据表中的M行旧数据被全部删除,用户不能读取数据表中的旧数据。其中,由于第R行删除数据的删除时间戳小于与其RowKey相同的第S行新数据中的更新时间戳的最小值,因此若存在第S行新数据,则第S行新数据仍然有效,即表数据更新文件中的所有数据依然能够被用户正常读取。
本实施例提供了一种KeyValue数据库的数据表的更新方法,包括:接收全量数据更新指令,根据全量数据更新指令获取待导入的数据,并根据待导入的数据生成全量数据更新文件;获取全量数据更新指令对应的数据表的M行旧数据,并根据M行旧数据,生成全量数据删除文件;将全量数据更新文件与全量数据删除文件合并为表数据更新文件;将表数据更新文件导入数据表。通过这样的方法,实现了KeyValue数据库的数据表的全部数据的全量更新。 同时,由于全量数据更新文件与全量数据删除文件不是逐条的导入KeyValue数据库中,因此与现有技术中逐条更新数据表中的KeyValue数据相比,本实施例提供的方法更新速度较快,且原子性较好。
本实施例与图1所示的实施例相比,先将全量数据更新文件与全量数据删除文件合并为表数据更新文件,再将表数据更新文件导入数据表中。这样只需要向数据表中导入一次文件即可实现表数据的全量更新。更为优选的,作为本发明的又一个实施例,表数据装置可以不生成全量数据删除文件与全量数据更新文件,直接根据全量数据更新指令直接生成表数据更新文件,然后将表数据更新文件导入数据表中。
优选的,表数据更新装置可以将表数据更新文件的保存路径更改到该数据表的目录下,以实现将表数据更新文件导入数据表。其中,更改保存路径的操作耗时仅为秒级别。
本发明实施例还提供了相关的表数据更新装置,其基本结构请参阅图6,主要包括:
指令接收模块601,用于接收全量数据更新指令;
生成文件模块602,用于根据全量数据更新指令获取待导入的数据,并根据待导入的数据生成全量数据更新文件;
数据获取模块603,用于获取全量数据更新指令对应的数据表的M行旧数据;
生成文件模块602,还用于根据M行旧数据,生成全量数据删除文件;
模块601至603与模块301至303基本相同,此处不做赘述。
文件合并模块604,用于将全量数据更新文件与全量数据删除文件合并为表数据更新文件
表数据更新装置生成了全量数据更新文件与全量数据删除文件后,文件合并模块604将全量数据更新文件与全量数据删除文件合并为表数据更新文件。合并后的表数据更新文件中,既包括全量数据更新文件的所有数据,也包括全量数据删除文件的所有数据。
文件导入模块605,用于将表数据更新文件导入数据表。
文件导入模块605将表数据更新文件导入全量数据更新指令对应的数据 表,其中,由于表数据更新文件被一次性的导入数据表中,所以能够保证数据导入的原子性。
其中,由于合并后的表数据更新文件中包括全量数据更新文件的所有数据,而全量数据更新文件的第S行新数据中更新时间戳的最小值,大于与其RowKey相同的第R行旧数据中的原始时间戳的最大值,因此在表数据更新文件导入数据表中后,若存在第R行旧数据,则第S行新数据能够覆盖第R行旧数据,用户能够读取第S行新数据,但不能够读取第R行旧数据。
其中,由于合并后的表数据更新文件中包括全量数据删除文件的所有数据,而第R行删除数据的删除时间戳大于与其RowKey相同的第R行旧数据中的原始时间戳的最大值,因此在表数据更新文件导入数据表中后,第R行删除数据能够删除与其RowKey相同的第R行旧数据。且因为全量数据删除文件包括与数据表的M行旧数据的RowKey一一对应的M行删除数据,因此在表数据更新文件导入到数据表中后,数据表中的M行旧数据被全部删除,用户不能读取数据表中的旧数据。其中,由于第R行删除数据的删除时间戳小于与其RowKey相同的第S行新数据中的更新时间戳的最小值,因此若存在第S行新数据,则第S行新数据仍然有效,即表数据更新文件中的所有数据依然能够被用户正常读取。
本实施例提供了一种表数据更新装置,其中,指令接收模块601接收全量数据更新指令,生成文件模块602根据全量数据更新指令获取待导入的数据,并根据待导入的数据生成全量数据更新文件;数据获取模块603获取全量数据更新指令对应的数据表的M行旧数据,生成文件模块602根据M行旧数据,生成全量数据删除文件;文件合并模块604将全量数据更新文件与全量数据删除文件合并为表数据更新文件;文件导入模块605将表数据更新文件导入数据表。通过这样的方法,实现了KeyValue数据库的数据表的全部数据的全量更新。同时,由于全量数据更新文件与全量数据删除文件不是逐条的导入KeyValue数据库中,因此与现有技术中逐条更新数据表中的KeyValue数据相比,本实施例提供的方法更新速度较快,且原子性较好。
优选的,文件导入模块605可以将表数据更新文件的保存路径更改到该数据表的目录下,以实现将表数据更新文件导入数据表。其中,更改保存路径的 操作耗时仅为秒级别。
本实施例与图3所示的实施例相比,先将全量数据更新文件与全量数据删除文件合并为表数据更新文件,再将表数据更新文件导入数据表中。这样只需要向数据表中导入一次文件即可实现表数据的全量更新。更为优选的,作为本发明的又一个实施例,生成文件模块602可以不生成全量数据删除文件与全量数据更新文件,直接根据全量数据更新指令直接生成表数据更新文件,然后文件导入模块605将表数据更新文件导入数据表中。
上面从单元化功能实体的角度对本发明实施例中的表数据更新装置进行了描述,下面从硬件处理的角度对本发明实施例中的表数据更新装置进行描述,请仍参阅图4,本发明实施例中的表数据更新装置400另一实施例包括:
输入装置401、输出装置402、处理器403和存储器404(其中表数据更新装置400中的处理器403的数量可以一个或多个,图4中以一个处理器403为例)。在本发明的一些实施例中,输入装置401、输出装置402、处理器403和存储器404可通过总线或其它方式连接,其中,图4中以通过总线连接为例。
其中,通过调用存储器404存储的操作指令,处理器403用于执行如下步骤:
接收全量数据更新指令;
根据全量数据更新指令获取待导入的数据,并根据待导入的数据生成全量数据更新文件,全量数据更新文件包括P行新数据,每行新数据包括一个行标识RowKey和Q列新属性,每列新属性的数据类型为输入Put类型,每列新属性设置有更新时间戳;
获取全量数据更新指令对应的数据表的M行旧数据,其中,每行旧数据包括一个RowKey和N列旧属性,每列旧属性设置有原始时间戳;
根据M行旧数据,生成全量数据删除文件,该全量数据删除文件包括M行删除数据,每行删除数据与每行旧数据一一对应,每行删除数据的数据类型为Delete类型,每行删除数据设置有删除时间戳,第R行删除数据的删除时间戳大于第R行旧数据的原始时间戳的最大值,第R行删除数据的删除时间戳小于第S行新数据的更新时间戳的最小值,其中,第R行删除数据、第R行旧数据与第S行新数据具有相同的RowKey,且1≤R≤M,1≤S≤P;
将全量数据更新文件与全量数据删除文件合并为表数据更新文件;
将表数据更新文件导入数据表。
本发明的一些实施例中,处理器403还用于执行如下步骤:
将所述表数据更新文件的保存路径更改到所述全量数据更新指令对应的数据表的目录下。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,模块和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统和方法,可以通过其它的方式实现。例如,以上所描述的系统实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,模块或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述 的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。

Claims (21)

  1. 一种KeyValue数据库的数据表的更新方法,其特征在于,包括:
    接收全量数据更新指令;
    根据所述全量数据更新指令获取待导入的数据,并根据所述待导入的数据生成全量数据更新文件,所述全量数据更新文件包括P行新数据,每行新数据包括一个行标识RowKey和Q列新属性,每列新属性的数据类型为输入Put类型,每列新属性设置有更新时间戳;
    获取所述全量数据更新指令对应的数据表的M行旧数据,其中,每行旧数据包括一个RowKey和N列旧属性,每列旧属性设置有原始时间戳;
    根据所述M行旧数据,生成全量数据删除文件,其中,所述全量数据删除文件包括M行删除数据,每行删除数据与所述每行旧数据一一对应,所述每行删除数据的数据类型为删除Delete类型,所述每行删除数据设置有删除时间戳,第R行删除数据的删除时间戳大于第R行旧数据的原始时间戳的最大值,第R行删除数据的删除时间戳小于第S行新数据的更新时间戳的最小值,所述第R行删除数据、所述第R行旧数据与所述第S行新数据具有相同的RowKey,且1≤R≤M,1≤S≤P;
    将所述全量数据更新文件导入所述数据表;
    将所述全量数据删除文件导入所述数据表。
  2. 根据权利要求1所述的KeyValue数据库的数据表的更新方法,其特征在于,先执行所述生成全量数据删除文件的步骤,再执行所述生成全量数据更新文件的步骤;
    所述每列新属性的更新时间戳的时刻为生成所述全量数据更新文件的时刻,所述每行删除数据的删除时间戳的时刻为生成所述全量数据删除文件的时刻。
  3. 根据权利要求1或2所述的KeyValue数据库的数据表的更新方法,其特征在于,先执行所述将所述全量数据更新文件导入所述数据表的步骤,再执行所述将所述全量数据删除文件导入所述数据表的步骤。
  4. 根据权利要求1至3中任一项所述的KeyValue数据库的数据表的更新方法,其特征在于,所述将所述全量数据更新文件导入所述数据表包括:
    将所述全量数据更新文件的保存路径更改到所述全量数据更新指令对应的数据表的目录下。
  5. 根据权利要求1至3中任一项所述的KeyValue数据库的数据表的更新方法,其特征在于,所述将所述全量数据删除文件导入所述数据表包括:
    将所述全量数据删除文件的保存路径更改到所述全量数据更新指令对应的数据表的目录下。
  6. 一种表数据更新装置,其特征在于,包括:
    接收模块,用于接收全量数据更新指令;
    文件生成模块,用于根据所述全量数据更新指令获取待导入的数据,并根据所述待导入的数据生成全量数据更新文件,所述全量数据更新文件包括P行新数据,每行新数据包括一个行标识RowKey和Q列新属性,每列新属性的数据类型为输入Put类型,每列新属性设置有更新时间戳;
    获取模块,用于获取所述全量数据更新指令对应的数据表的M行旧数据,其中,每行旧数据包括一个RowKey和N列旧属性,每列旧属性设置有原始时间戳;
    所述文件生成模块,还用于根据所述M行旧数据,生成全量数据删除文件,所述全量数据删除文件包括M行删除数据,每行删除数据与所述每行旧数据一一对应,所述每行删除数据的数据类型为Delete类型,所述每行删除数据设置有删除时间戳,第R行删除数据的删除时间戳大于第R行旧数据的原始时间戳的最大值,第R行删除数据的删除时间戳小于第S行新数据的更新时间戳的最小值,所述第R行删除数据、所述第R行旧数据与所述第S行新数据具有相同的RowKey,且1≤R≤M,1≤S≤P;
    导入模块,用于将所述全量数据更新文件导入所述数据表,以及将所述全量数据删除文件导入所述数据表。
  7. 根据权利要求6所述的表数据更新装置,其特征在于,所述文件生成模块,先生成所述全量数据删除文件,再生成所述全量数据更新文件,将生成所述全量数据更新文件的时刻设置为所述每列新属性的更新时间戳,将生成所述全量数据删除文件的时刻设置为所述每行删除数据的删除时间戳。
  8. 根据权利要求6或7所述的表数据更新装置,其特征在于,所述导入 模块,先将所述全量数据更新文件导入所述数据表,再将所述全量数据删除文件导入所述数据表。
  9. 根据权利要求6-8中任意一项所述的表数据更新装置,其特征在于,所述导入模块具体通过如下方法将所述全量数据更新文件导入所述数据表:
    将所述全量数据更新文件的保存路径更改到所述全量数据更新指令对应的数据表的目录下。
  10. 根据权利要求6-8中任意一项所述的表数据更新装置,其特征在于,所述导入模块具体通过如下方法将所述全量数据删除文件导入所述数据表:
    将所述全量数据删除文件的保存路径更改到所述全量数据更新指令对应的数据表的目录下。
  11. 一种表数据更新装置,包括输入装置、输出装置、处理器和存储器,其特征在于,通过调用存储器存储的操作指令,所述处理器用于执行如下步骤:
    接收全量数据更新指令;
    根据所述全量数据更新指令获取待导入的数据,并根据所述待导入的数据生成全量数据更新文件,所述全量数据更新文件包括P行新数据,每行新数据包括一个行标识RowKey和Q列新属性,每列新属性的数据类型为输入Put类型,每列新属性设置有更新时间戳;
    获取所述全量数据更新指令对应的数据表的M行旧数据,其中,每行旧数据包括一个RowKey和N列旧属性,每列旧属性设置有原始时间戳;
    根据所述M行旧数据,生成全量数据删除文件,其中,所述全量数据删除文件包括M行删除数据,每行删除数据与所述每行旧数据一一对应,所述每行删除数据的数据类型为Delete类型,所述每行删除数据设置有删除时间戳,第R行删除数据的删除时间戳大于第R行旧数据的原始时间戳的最大值,第R行删除数据的删除时间戳小于第S行新数据的更新时间戳的最小值,所述第R行删除数据、所述第R行旧数据与所述第S行新数据具有相同的RowKey,且1≤R≤M,1≤S≤P;
    将所述全量数据更新文件导入所述数据表;
    将所述全量数据删除文件导入所述数据表。
  12. 根据权利要求11所述的表数据更新装置,其特征在于,所述处理器, 先生成所述全量数据删除文件,再生成所述全量数据更新文件,将生成所述全量数据更新文件的时刻设置为所述每列新属性的更新时间戳,将生成所述全量数据删除文件的时刻设置为所述每行删除数据的删除时间戳。
  13. 根据权利要求11或12所述的表数据更新装置,其特征在于,所述处理器,先将所述全量数据更新文件导入所述数据表,再将所述全量数据删除文件导入所述数据表。
  14. 根据权利要求11-13中任意一项所述的表数据更新装置,其特征在于,所述处理器还用于执行:
    将所述全量数据更新文件的保存路径更改到所述全量数据更新指令对应的数据表的目录下。
  15. 根据权利要求11-13中任意一项所述的表数据更新装置,其特征在于,所述处理器还用于执行:
    将所述全量数据删除文件的保存路径更改到所述全量数据更新指令对应的数据表的目录下。
  16. 一种KeyValue数据库的数据表的更新方法,其特征在于,包括:
    接收全量数据更新指令;
    根据所述全量数据更新指令获取待导入的数据,并根据所述待导入的数据生成全量数据更新文件,所述全量数据更新文件包括P行新数据,每行新数据包括一个行标识RowKey和Q列新属性,每列新属性的数据类型为输入Put类型,每列新属性设置有更新时间戳;
    获取所述全量数据更新指令对应的数据表的M行旧数据,其中,每行旧数据包括一个RowKey和N列旧属性,每列旧属性设置有原始时间戳;
    根据所述M行旧数据,生成全量数据删除文件,其中,所述全量数据删除文件包括M行删除数据,每行删除数据与所述每行旧数据一一对应,所述每行删除数据的数据类型为删除Delete类型,所述每行删除数据设置有删除时间戳,第R行删除数据的删除时间戳大于第R行旧数据的原始时间戳的最大值,第R行删除数据的删除时间戳小于第S行新数据的更新时间戳的最小值,所述第R行删除数据、所述第R行旧数据与所述第S行新数据具有相同的RowKey,且1≤R≤M,1≤S≤P;
    将所述全量数据更新文件与所述全量数据删除文件合并为表数据更新文件;
    将所述表数据更新文件导入所述数据表。
  17. 根据权利要求16所述的KeyValue数据库的数据表的更新方法,其特征在于,所述将所述表数据更新文件导入所述数据表包括:
    将所述表数据更新文件的保存路径更改到所述全量数据更新指令对应的数据表的目录下。
  18. 一种表数据更新装置,其特征在于,包括:
    指令接收模块,用于接收全量数据更新指令;
    生成文件模块,用于根据所述全量数据更新指令获取待导入的数据,并根据所述待导入的数据生成全量数据更新文件,所述全量数据更新文件包括P行新数据,每行新数据包括一个行标识RowKey和Q列新属性,每列新属性的数据类型为输入Put类型,每列新属性设置有更新时间戳;
    数据获取模块,用于获取所述全量数据更新指令对应的数据表的M行旧数据,其中,每行旧数据包括一个RowKey和N列旧属性,每列旧属性设置有原始时间戳;
    所述生成文件模块,还用于根据所述M行旧数据,生成全量数据删除文件,所述全量数据删除文件包括M行删除数据,每行删除数据与所述每行旧数据一一对应,所述每行删除数据的数据类型为Delete类型,所述每行删除数据设置有删除时间戳,第R行删除数据的删除时间戳大于第R行旧数据的原始时间戳的最大值,第R行删除数据的删除时间戳小于第S行新数据的更新时间戳的最小值,所述第R行删除数据、所述第R行旧数据与所述第S行新数据具有相同的RowKey,且1≤R≤M,1≤S≤P;
    文件合并模块,用于将所述全量数据更新文件与所述全量数据删除文件合并为表数据更新文件
    文件导入模块,用于将所述表数据更新文件导入所述数据表。
  19. 根据权利要求18所述的表数据更新装置,其特征在于,所述文件导入模块具体用于:
    将所述表数据更新文件的保存路径更改到所述全量数据更新指令对应的 数据表的目录下。
  20. 一种表数据更新装置,包括输入装置、输出装置、处理器和存储器,其特征在于,通过调用存储器存储的操作指令,所述处理器用于执行如下步骤:
    接收全量数据更新指令;
    根据所述全量数据更新指令获取待导入的数据,并根据所述待导入的数据生成全量数据更新文件,所述全量数据更新文件包括P行新数据,每行新数据包括一个行标识RowKey和Q列新属性,每列新属性的数据类型为输入Put类型,每列新属性设置有更新时间戳;
    获取所述全量数据更新指令对应的数据表的M行旧数据,其中,每行旧数据包括一个RowKey和N列旧属性,每列旧属性设置有原始时间戳;
    根据所述M行旧数据,生成全量数据删除文件,其中,所述全量数据删除文件包括M行删除数据,每行删除数据与所述每行旧数据一一对应,所述每行删除数据的数据类型为Delete类型,所述每行删除数据设置有删除时间戳,第R行删除数据的删除时间戳大于第R行旧数据的原始时间戳的最大值,第R行删除数据的删除时间戳小于第S行新数据的更新时间戳的最小值,所述第R行删除数据、所述第R行旧数据与所述第S行新数据具有相同的RowKey,且1≤R≤M,1≤S≤P;
    将所述全量数据更新文件与所述全量数据删除文件合并为表数据更新文件;
    将所述表数据更新文件导入所述数据表。
  21. 根据权利要求20所述的表数据更新装置,其特征在于,所述处理器还用于执行:
    将所述表数据更新文件的保存路径更改到所述全量数据更新指令对应的数据表的目录下。
PCT/CN2015/073211 2014-11-12 2015-02-17 一种KeyValue数据库的数据表的更新方法与表数据更新装置 Ceased WO2016074370A1 (zh)

Priority Applications (6)

Application Number Priority Date Filing Date Title
CN201580000911.7A CN105900093B (zh) 2014-11-12 2015-02-17 一种KeyValue数据库的数据表的更新方法与表数据更新装置
AU2015316450A AU2015316450B2 (en) 2014-11-12 2015-02-17 Method for updating data table of KeyValue database and apparatus for updating table data
JP2016519954A JP6251388B2 (ja) 2014-11-12 2015-02-17 KeyValueデータベースのデータテーブルを更新するための方法およびテーブルデータを更新するための装置
CA2922388A CA2922388C (en) 2014-11-12 2015-02-17 Method and apparatus for updating data table of keyvalue database
EP15832861.7A EP3051440B1 (en) 2014-11-12 2015-02-17 Keyvalue database data table updating method and data table updating device
US15/054,475 US10467192B2 (en) 2014-11-12 2016-02-26 Method and apparatus for updating data table in keyvalue database

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2014090934 2014-11-12
CNPCT/CN2014/090934 2014-11-12

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/054,475 Continuation US10467192B2 (en) 2014-11-12 2016-02-26 Method and apparatus for updating data table in keyvalue database

Publications (1)

Publication Number Publication Date
WO2016074370A1 true WO2016074370A1 (zh) 2016-05-19

Family

ID=55953654

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/073211 Ceased WO2016074370A1 (zh) 2014-11-12 2015-02-17 一种KeyValue数据库的数据表的更新方法与表数据更新装置

Country Status (7)

Country Link
US (1) US10467192B2 (zh)
EP (1) EP3051440B1 (zh)
JP (2) JP6251388B2 (zh)
CN (2) CN107977396B (zh)
AU (1) AU2015316450B2 (zh)
CA (1) CA2922388C (zh)
WO (1) WO2016074370A1 (zh)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10204135B2 (en) * 2015-07-29 2019-02-12 Oracle International Corporation Materializing expressions within in-memory virtual column units to accelerate analytic queries
US10372706B2 (en) 2015-07-29 2019-08-06 Oracle International Corporation Tracking and maintaining expression statistics across database queries
US10671641B1 (en) * 2016-04-25 2020-06-02 Gravic, Inc. Method and computer program product for efficiently loading and synchronizing column-oriented databases
CN109960212B (zh) * 2017-12-25 2020-07-31 北京京东乾石科技有限公司 任务发送方法和装置
US11226955B2 (en) 2018-06-28 2022-01-18 Oracle International Corporation Techniques for enabling and integrating in-memory semi-structured data and text document searches with in-memory columnar query processing
CN108920725B (zh) 2018-08-02 2020-08-04 网宿科技股份有限公司 一种对象存储的方法及对象存储网关
CN109471866B (zh) * 2018-11-09 2021-10-22 南京医渡云医学技术有限公司 增量医疗数据更新方法及系统
CN109582726B (zh) * 2018-12-18 2021-09-07 网易(杭州)网络有限公司 数据表的处理方法和装置
CN109688266B (zh) * 2018-12-21 2020-12-18 北京金山安全软件有限公司 铃声设置方法、装置和电子设备
US11507590B2 (en) 2019-09-13 2022-11-22 Oracle International Corporation Techniques for in-memory spatial object filtering
CN116303506B (zh) * 2023-03-20 2026-01-13 兴业银行股份有限公司 数据库更新方法、装置、计算机设备、存储介质
CN116561141A (zh) * 2023-03-30 2023-08-08 青岛海尔科技有限公司 数据处理文件的更新方法及装置、存储介质及电子装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7769732B2 (en) * 2007-08-27 2010-08-03 International Business Machines Corporation Apparatus and method for streamlining index updates in a shared-nothing architecture
CN103595776A (zh) * 2013-11-05 2014-02-19 福建网龙计算机网络信息技术有限公司 分布式缓存方法及系统
CN103617232A (zh) * 2013-11-26 2014-03-05 北京京东尚科信息技术有限公司 一种针对HBase表的分页查询方法
CN103714163A (zh) * 2013-12-30 2014-04-09 中国科学院信息工程研究所 一种NoSQL数据库的模式管理方法及系统
US20140149355A1 (en) * 2012-11-26 2014-05-29 Amazon Technologies, Inc. Streaming restore of a database from a backup system

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6959301B2 (en) 2001-01-04 2005-10-25 Reuters Limited Maintaining and reconstructing the history of database content modified by a series of events
US6665654B2 (en) * 2001-07-03 2003-12-16 International Business Machines Corporation Changing table records in a database management system
US6792429B2 (en) * 2001-12-19 2004-09-14 Hewlett-Packard Development Company, L.P. Method for fault tolerant modification of data representation in a large database
US7149736B2 (en) * 2003-09-26 2006-12-12 Microsoft Corporation Maintaining time-sorted aggregation records representing aggregations of values from multiple database records using multiple partitions
US7624119B2 (en) * 2004-02-11 2009-11-24 International Business Machines Corporation Low-overhead built-in timestamp column for relational database systems
US8533169B1 (en) * 2005-09-21 2013-09-10 Infoblox Inc. Transactional replication
CN101127915B (zh) 2007-09-20 2011-04-20 中兴通讯股份有限公司 一种基于增量式的电子节目导航数据同步方法及系统
US20090083503A1 (en) * 2007-09-20 2009-03-26 Inventec Corporation System of creating logical volume and method thereof
JP5448428B2 (ja) * 2008-11-27 2014-03-19 三菱電機株式会社 データ管理システム及びデータ管理方法及びデータ管理プログラム
US8918380B2 (en) * 2009-07-09 2014-12-23 Norsync Technology As Methods, systems and devices for performing incremental updates of partial databases
EP2302534B1 (en) * 2009-09-18 2017-12-13 Software AG Method for mass-deleting data records of a database system
US8825601B2 (en) * 2010-02-01 2014-09-02 Microsoft Corporation Logical data backup and rollback using incremental capture in a distributed database
US20120284317A1 (en) * 2011-04-26 2012-11-08 Dalton Michael W Scalable Distributed Metadata File System using Key-Value Stores
CN102279885A (zh) 2011-08-16 2011-12-14 中兴通讯股份有限公司 内存数据库对数据的操作方法及装置
US8751525B2 (en) * 2011-12-23 2014-06-10 Sap Ag Time slider operator for temporal data aggregation
CN103473239B (zh) 2012-06-08 2016-12-21 腾讯科技(深圳)有限公司 一种非关系型数据库数据更新方法和装置
CN103002011B (zh) 2012-10-29 2016-06-29 北京奇虎科技有限公司 基于服务器的数据更新方法和服务器
WO2014068820A1 (ja) * 2012-10-29 2014-05-08 日本電気株式会社 トランザクションシステム
JP6103037B2 (ja) * 2013-03-15 2017-03-29 日本電気株式会社 計算機システム

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7769732B2 (en) * 2007-08-27 2010-08-03 International Business Machines Corporation Apparatus and method for streamlining index updates in a shared-nothing architecture
US20140149355A1 (en) * 2012-11-26 2014-05-29 Amazon Technologies, Inc. Streaming restore of a database from a backup system
CN103595776A (zh) * 2013-11-05 2014-02-19 福建网龙计算机网络信息技术有限公司 分布式缓存方法及系统
CN103617232A (zh) * 2013-11-26 2014-03-05 北京京东尚科信息技术有限公司 一种针对HBase表的分页查询方法
CN103714163A (zh) * 2013-12-30 2014-04-09 中国科学院信息工程研究所 一种NoSQL数据库的模式管理方法及系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3051440A4

Also Published As

Publication number Publication date
JP6251388B2 (ja) 2017-12-20
EP3051440A1 (en) 2016-08-03
EP3051440B1 (en) 2019-01-02
AU2015316450B2 (en) 2016-11-03
CN105900093A (zh) 2016-08-24
JP2018049656A (ja) 2018-03-29
CN107977396B (zh) 2021-07-20
EP3051440A4 (en) 2016-12-28
AU2015316450A1 (en) 2016-05-26
US20160179836A1 (en) 2016-06-23
CA2922388A1 (en) 2016-05-12
CA2922388C (en) 2018-09-18
US10467192B2 (en) 2019-11-05
JP2017500622A (ja) 2017-01-05
CN107977396A (zh) 2018-05-01
JP6521402B2 (ja) 2019-05-29
CN105900093B (zh) 2018-02-02

Similar Documents

Publication Publication Date Title
WO2016074370A1 (zh) 一种KeyValue数据库的数据表的更新方法与表数据更新装置
CN106970936B (zh) 数据处理方法及装置、数据查询方法及装置
JP2021517672A5 (zh)
WO2019128318A1 (zh) 数据处理方法、装置和系统
CN104866497A (zh) 分布式文件系统列式存储的元数据更新方法、装置、主机
CN111339171B (zh) 数据查询的方法、装置及设备
CN103164525B (zh) Web应用发布方法和装置
CN104102710A (zh) 一种海量数据查询方法
CN111046106A (zh) 缓存数据同步方法、装置、设备及介质
CN105677904B (zh) 基于分布式文件系统的小文件存储方法及装置
US20170060922A1 (en) Method and device for data search
CN115994148B (zh) 多表数据的更新方法、装置、电子设备及可读存储介质
CN104461929B (zh) 基于拦截器的分布式数据缓存方法
CN110928895B (zh) 一种数据查询、数据表建立方法、装置及设备
CN112559913B (zh) 一种数据处理方法、装置、计算设备及可读存储介质
CN108255959A (zh) 一种Redis中数据查询方法及装置
CN105335450B (zh) 数据存储处理方法及装置
CN110598072A (zh) 一种特征数据聚合方法及装置
CN106682047B (zh) 一种数据导入方法以及相关装置
CN111767267A (zh) 元数据处理方法、装置、电子设备
CN115185573A (zh) 应用配置信息的配置方法、装置、计算机设备和存储介质
CN107665241B (zh) 一种实时数据多维度去重方法和装置
WO2020124491A1 (zh) 数据的切分方法、装置、计算机设备及存储介质
CN114647630B (zh) 文件同步、信息生成方法、装置、计算机设备和存储介质
US10114864B1 (en) List element query support and processing

Legal Events

Date Code Title Description
REEP Request for entry into the european phase

Ref document number: 2015832861

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2015832861

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2922388

Country of ref document: CA

WWE Wipo information: entry into national phase

Ref document number: 2015316450

Country of ref document: AU

ENP Entry into the national phase

Ref document number: 2016519954

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15832861

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE