WO2017133233A1 - 基于心跳的数据同步装置、方法及分布式存储系统 - Google Patents

基于心跳的数据同步装置、方法及分布式存储系统 Download PDF

Info

Publication number
WO2017133233A1
WO2017133233A1 PCT/CN2016/097244 CN2016097244W WO2017133233A1 WO 2017133233 A1 WO2017133233 A1 WO 2017133233A1 CN 2016097244 W CN2016097244 W CN 2016097244W WO 2017133233 A1 WO2017133233 A1 WO 2017133233A1
Authority
WO
WIPO (PCT)
Prior art keywords
data block
block group
heartbeat time
heartbeat
state information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2016/097244
Other languages
English (en)
French (fr)
Inventor
刘存伟
吴国军
黄西华
金雪锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to EP16854615.8A priority Critical patent/EP3220610B1/en
Priority to US15/583,687 priority patent/US10025529B2/en
Publication of WO2017133233A1 publication Critical patent/WO2017133233A1/zh
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/065Replication mechanisms
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operations
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/142Reconfiguring to eliminate the error
    • G06F11/1425Reconfiguring to eliminate the error by reconfiguration of node membership
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operations
    • G06F11/1471Error detection or correction of the data by redundancy in operations involving logging of persistent data for recovery
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3034Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a storage system, e.g. DASD based or network based
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • G06F11/3433Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment for load management
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3452Performance evaluation by statistical analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • G06F16/1824Distributed file systems implemented using Network-attached Storage [NAS] architecture
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0653Monitoring storage devices or systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/06Generation of reports
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/10Active monitoring, e.g. heartbeat, ping or trace-route
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/10Active monitoring, e.g. heartbeat, ping or trace-route
    • H04L43/103Active monitoring, e.g. heartbeat, ping or trace-route with adaptive polling, i.e. dynamically adapting the polling rate
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/14Session management
    • H04L67/143Termination or inactivation of sessions, e.g. event-controlled end of session
    • H04L67/145Termination or inactivation of sessions, e.g. event-controlled end of session avoiding end of session, e.g. keep-alive, heartbeats, resumption message or wake-up for inactive or interrupted session
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/28Timers or timing mechanisms used in protocols
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2094Redundant storage or storage space
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2097Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements maintaining the standby controller/processing unit updated

Definitions

  • the present invention relates to the field of data synchronization technologies, and in particular, to a heartbeat-based data synchronization apparatus, method, and distributed storage system.
  • Raft consistency algorithm is widely used in distributed storage systems.
  • the data in the distributed storage system based on the Raft consistency algorithm is divided into several data block groups, and each data block group is composed of multiple identical data blocks respectively stored in different storage devices, in realizing Raft consistency.
  • data is synchronized in units of data block groups. Specifically, for a data block group, the storage device where each data block of the data block group is located first generates a master device of the data block group by election, and correspondingly, the other device is a slave device of the data block group. The master device is responsible for handling the interaction with the client.
  • the master device When the master device receives the read/write command sent by the client, the read/write command is recorded as a log, and the data synchronization command including the log is sent to each slave device according to a fixed heartbeat time, and each slave device performs the data synchronization command according to the data synchronization command.
  • Data synchronization if there is no log to be sent when the heartbeat time is reached, the master device needs to send a heartbeat signal that does not contain the data synchronization command to each slave device to determine that the connection is normal.
  • the heartbeat time of each data block group is a fixed value, and the read/write frequency of each data block group is not balanced, in order to ensure each data block in the data block group with high read/write frequency.
  • the fixed value is usually set in a timely manner, and the fixed value is usually set to be small, so that each storage device in which the data block group with a low read/write frequency is located also needs to frequently send and receive signals, which causes a large system overhead and affects the storage system. Read and write performance.
  • the distributed storage system stores at least one data block group, and the distributed storage system includes a plurality of storage devices; one of the plurality of storage devices is a primary device that stores the data block group, and the remaining devices store the data block. Group of slave devices.
  • the distributed storage system can also include a coordination device that is coupled to each storage device in the distributed storage system.
  • a heartbeat-based data synchronization method comprising:
  • the master device acquires access state information of the data block group; the master device determines a heartbeat time of the data block group according to the access state information of the data block group; and the master device sends the heartbeat time to the slave device according to the heartbeat time of the data block group A data synchronization instruction is sent, the data synchronization instruction is used to instruct the slave device to perform data synchronization.
  • the heartbeat time of the data block group is adaptively determined according to the related information such as the read/write frequency of the data block group, and the existing Raft consistency algorithm is solved.
  • Frequent transmission and reception of signals between storage devices in a data block group results in large system overhead and affects the read/write performance of the storage system. It reduces the system overhead of the distributed storage system and improves the reading of the storage system. Write performance effects.
  • the access state information of the data block group includes a read frequency and a write frequency of the data block group.
  • the master device determines a heartbeat time of the data block group according to the access state information of the data block group
  • the master device scores the access state information according to the preset first scoring rule to obtain a first reference score; and the master device determines a heartbeat time of the data block group according to the first reference score.
  • a method of determining a heartbeat time based on at least two pieces of information including a read frequency and a write frequency is provided to adaptively determine a heartbeat time of the data block group.
  • the master device determines a heartbeat time of the data block group according to the access state information of the data block group, including: The master device scores the read frequency and the write frequency respectively according to the first scoring rule; the master device uses the sum of the scores of the read frequency and the write frequency as a first reference score; the master device according to the first The reference score determines the heartbeat time of the data block group.
  • the master device determines a heartbeat time of the data block group according to the access state information of the data block group, including: The master device acquires a preset reference time and a first weight corresponding to the access state information; the master device calculates a heartbeat time of the data block group according to the access state information, the reference time, and the first weight. Providing a heartbeat determination based on at least two pieces of information including a read frequency and a write frequency The way to adaptively determine the heartbeat time of the data block group.
  • the master device determines a heartbeat time of the data block group according to the access state information of the data block group, including:
  • the heartbeat time is calculated by the following formula:
  • heartbeatTime is the heartbeat time
  • Time is the reference time
  • R is the value of the read frequency
  • W is the value of the write frequency
  • weightR is the weight corresponding to the read frequency
  • weightW is the weight corresponding to the write frequency.
  • the master device determines the heartbeat time of the data block group according to the access state information of the data block group, and comprehensively considers the impact of different access state information on the overall storage system, and saves the actual data from the actual data.
  • the heartbeat time is configured more reasonably, and the "heartbeat storm" caused by the frequent transmission and reception of signals between the storage devices of a data block group in the current multi-Raft system is solved, resulting in system overhead. Larger, affecting the read and write performance of the storage system.
  • the method further includes: the master device The heartbeat time is sent to the slave device such that the slave device sets an election timeout period according to the heartbeat time.
  • the master device sends the heartbeat time of the data block group to the slave device, so that the slave device sets the election timeout time according to the heartbeat time, and considers the influence of the heartbeat time on the election timeout period, and elects
  • the synchronization of the master device leading data which is most beneficial to the overall performance of the system, improves the overall read and write performance of the storage system.
  • the distributed storage system further includes the multiple storage device
  • the connected coordination device the method further includes: the master device transmitting the heartbeat time of the data block group to the coordination device; the master device receiving the corrected heartbeat time returned by the coordination device; the master device according to the correction The subsequent heartbeat time sends a data synchronization command to the slave device.
  • the heartbeat time of each data block group is corrected according to the read/write frequency of each data block group, and the heartbeat time of each data block group of the storage system is optimized as a whole, thereby further improving the storage system. Read and write performance.
  • a heartbeat-based data synchronization method comprising: receiving from a device Receiving a data synchronization instruction sent by the master device according to the heartbeat time of the data block group; the heartbeat time is obtained by the master device, and the access state information of the data block group is determined according to the access state information of the data block group; The slave device performs data synchronization in accordance with the data synchronization command.
  • the heartbeat time of the data block group is adaptively determined according to the related information such as the read/write frequency of the data block group, and the data synchronization between the devices is performed according to the determined heartbeat time, thereby solving the existing Raft.
  • the consistency algorithm frequent storage and reception signals are required between the storage devices in a data block group, which causes a large system overhead and affects the read/write performance of the storage system.
  • the system overhead of the distributed storage system is reduced. Improve the read and write performance of the storage system.
  • the access state information of the data block group includes a read frequency and a write frequency of the data block group.
  • the method further includes: receiving, by the slave device, the heartbeat time sent by the master device; the slave device according to the heartbeat time The election timeout period is set.
  • the slave device does not receive any signal sent by the master device within the election timeout period, the slave device initiates a master device election to each of the plurality of storage devices.
  • the slave device sets the election timeout time according to the heartbeat time, considers the influence of the heartbeat time on the election timeout period, and selects the synchronization of the master device dominant data that is most beneficial to the overall performance of the system, and improves the synchronization.
  • the overall read and write performance of the storage system The overall read and write performance of the storage system.
  • the slave device sets an election timeout period according to the heartbeat time, including: the slave device according to the preset second rating rule And scoring the access status information to obtain a second reference score; the slave device determines a first timeout coefficient of the data block group according to the second reference score; the slave device compares the first timeout coefficient with the heartbeat time The product of the setting is set to the election timeout period.
  • the slave device sets an election timeout period according to the heartbeat time, including: acquiring, by the slave device, a preset reference coefficient and a location a second weight corresponding to the access state information; the slave device calculates a second timeout coefficient of the data block group according to the access state information, the reference coefficient, and the second weight; the slave device uses the second timeout coefficient
  • the product of the heartbeat time is set to the election timeout period.
  • the slave device sets an election timeout period according to the heartbeat time, including:
  • OverTime is the election timeout period
  • heartbeatTime is the heartbeat time
  • Reference is the reference coefficient
  • R is the value of the read frequency
  • W is the value of the write frequency
  • weightR is the weight corresponding to the read frequency
  • weightW is the write The weight corresponding to the frequency.
  • the master device determines the heartbeat time of the data block group according to the access state information of the data block group, sets an election timeout period according to the heartbeat time, and comprehensively considers different access state information to the overall storage.
  • the impact of the system starting from the actual data access situation, more rationally configure the election time, improve the overall read and write performance of the storage system.
  • a third aspect provides a heartbeat-based data synchronization method, the method comprising: a coordination device collecting access state information of the data block group, and receiving a heartbeat time of the data block group sent by the primary device; the heartbeat The time is obtained by the master device, and the access state information of the data block group is determined according to the access state information of the data block group; the coordination device determines the importance of the data block group according to the access state information of the data block group.
  • the coordination level corrects the heartbeat time according to the importance level of the data block group; the coordination device returns the corrected heartbeat time to the master device.
  • the coordination device determines the importance level of the data block group according to the access state information of the data block group, and corrects the heartbeat time according to the importance level of the data block group. From the perspective of the overall system, through the overall adjustment of the heartbeat time, resources are allocated more reasonably, and the overall read and write performance of the storage system is improved.
  • a fourth aspect provides a network device, including: a processor, a network interface, a memory, and a bus, wherein the memory and the network interface are respectively connected to the processor through a bus; the processor is configured to execute an instruction stored in the memory; The processor implements the heartbeat-based data synchronization method provided by any of the possible implementations of the first aspect or the first aspect by executing the instructions.
  • a fifth aspect provides a network device, including: a processor, a network interface, a memory, and a bus, wherein the memory and the network interface are respectively connected to the processor through a bus; the processor is configured to execute an instruction stored in the memory; The processor implements the heartbeat-based data synchronization method provided by any of the possible implementations of the second aspect or the second aspect by executing the instructions.
  • a network device in a sixth aspect, includes: a processor, a network interface, a memory, and a bus, wherein the memory and the network interface are respectively connected to the processor through a bus; the processor is configured to execute an instruction stored in the memory; The processor implements the heartbeat-based data synchronization method provided by the above third aspect by executing an instruction.
  • an embodiment of the present invention provides a heartbeat-based data synchronization apparatus, where the interface interaction apparatus includes at least one unit, and the at least one unit is configured to implement any one of the foregoing first aspect or the first aspect.
  • the heartbeat-based data synchronization method provided by the method.
  • an embodiment of the present invention provides a heartbeat-based data synchronization apparatus, where the interface interaction apparatus includes at least one unit, and the at least one unit is configured to implement any one of the foregoing second aspect or the second aspect.
  • the heartbeat-based data synchronization method provided by the method.
  • an embodiment of the present invention provides a heartbeat-based data synchronization apparatus, where the interface interaction apparatus includes at least one unit, and the at least one unit is configured to implement the heartbeat-based data synchronization method provided by the foregoing third aspect.
  • the master device determines a heartbeat time of the data block group according to the access state information of the data block group; the master device sends a data synchronization instruction to the slave device according to the heartbeat time of the data block group,
  • the data synchronization instruction is used to instruct the slave device to perform data synchronization, and solves the problem that the existing Raft consistency algorithm requires frequent transmission and reception between the storage devices where a data block group is located, resulting in a large system overhead and affecting the storage system.
  • the problem of read and write performance the effect of reducing the system overhead of the distributed storage system and improving the read and write performance of the storage system.
  • FIG. 1A is a schematic structural diagram of a distributed storage system according to an exemplary embodiment of the present invention.
  • FIG. 1B is a schematic structural diagram of a network device according to the embodiment shown in FIG. 1A;
  • FIG. 1B is a schematic structural diagram of a network device according to the embodiment shown in FIG. 1A;
  • FIG. 1C is a schematic diagram of an application module involved in the embodiment shown in FIG. 1A;
  • FIG. 1D is a schematic diagram of another application module involved in the embodiment shown in FIG. 1A; FIG.
  • FIG. 1E is a schematic diagram of still another application module involved in the embodiment shown in FIG. 1A;
  • FIG. 2 is a flowchart of a method for a heartbeat-based data synchronization method according to an exemplary embodiment of the present invention
  • FIG. 3 is a flowchart of a method for a heartbeat-based data synchronization method according to an exemplary embodiment of the present invention
  • FIG. 4 is a block diagram showing the structure of a heartbeat-based data synchronization apparatus according to an exemplary embodiment of the present invention
  • FIG. 5 is a structural block diagram of a heartbeat-based data synchronization apparatus according to an exemplary embodiment of the present invention.
  • FIG. 6 is a structural block diagram of a heartbeat-based data synchronization apparatus according to an exemplary embodiment of the present invention.
  • the distributed storage system includes a plurality of storage devices 110.
  • the distributed storage system stores a plurality of data block groups, each data block group includes a plurality of identical data blocks, and a plurality of data blocks in the same data block group are stored in different storage devices.
  • one storage device is a master device storing the data block group
  • the remaining storage devices are slave devices storing the data block group
  • the master device is responsible for processing between the client and the client. Interaction.
  • the master device of the data block group can obtain the access state information of the data block group, and determine the heartbeat time of the data block group according to the access state information of the data block group, according to the The heartbeat time of the data block group sends a data synchronization instruction to the slave device; the slave device of the data block group is configured to perform data synchronization according to the data synchronization instruction, so that the plurality of data blocks are consistent.
  • the distributed storage system further includes: a coordination device 120.
  • the coordination device 120 is connected to and communicates with a number of storage devices 110 over a wired or wireless network.
  • FIG. 1B shows a schematic structural diagram of a network device according to an exemplary embodiment of the present invention.
  • the network device 10 can be the storage device 110 or the coordination device 120, and the network device 10 includes a processor 12 and a network interface 14.
  • Processor 12 includes one or more processing cores.
  • the processor 12 executes various functional applications and data processing by running software programs and modules.
  • Network interfaces 14 There may be multiple network interfaces 14, which are used to communicate with other storage devices or network devices.
  • the network device 10 further includes components such as a memory 16, a bus 18, and the like.
  • the memory 16 and the network interface 14 are connected to the processor 12 via a bus 18, respectively.
  • Memory 16 can be used to store software programs as well as modules. Specifically, the memory 16 can store an operating system 162, an application module 164 required for at least one function.
  • the operating system 162 can be an operating system such as Real Time eXecutive (RTX), LINUX, UNIX, WINDOWS, or OS X.
  • FIG. 1C shows a schematic diagram of an application module according to an exemplary embodiment of the present invention.
  • the application module 164 may be an information acquisition module 164a, a heartbeat time determination module 164b, and a first instruction.
  • the information acquisition module 164a acquires access state information of the data block group.
  • the heartbeat time determination module 164b determines the heartbeat time of the data block group based on the access state information of the data block group.
  • the first instruction sending module 164c sends a data synchronization instruction to the slave device according to the heartbeat time of the data block group.
  • the first time transmitting module 164d transmits the heartbeat time of the data block group to the slave device.
  • the second time sending module 164e sends the heartbeat time of the data block group to the coordinating device.
  • the first time receiving module 164f receives the corrected heartbeat time returned by the coordination device.
  • the second instruction sending module 164g sends the data synchronization finger to the slave device according to the corrected heartbeat time. make.
  • FIG. 1D shows a schematic diagram of another application module according to an exemplary embodiment of the present invention.
  • the application module 164 may be an instruction receiving module 164h, a data synchronization module 164i, a second time receiving module 164j, a setting module 164k, and an election initiating module 1641.
  • the instruction receiving module 164h receives the data synchronization instruction sent by the master device according to the heartbeat time of the data block group; the heartbeat time is the access state information of the data block group acquired by the master device, and is determined according to the access state information of the data block group.
  • the data synchronization module 164i performs data synchronization in accordance with the data synchronization instructions to keep the plurality of data blocks consistent.
  • the second time receiving module 164j receives the heartbeat time sent by the master device.
  • the setting module 164k sets the election timeout period according to the heartbeat time.
  • the election initiation module 164l initiates a master device election to each of the plurality of storage devices when the slave device does not receive any signal sent by the master device within the election timeout period.
  • FIG. 1E shows a schematic diagram of still another application module according to an exemplary embodiment of the present invention.
  • the application module 164 may be a statistics module 164m, a third time receiving module 164n, a level determining module 164p, a correction module 164q, and a time return module 164r.
  • the statistics module 164m accesses the status information of the data block group.
  • the third time receiving module 164n receives the heartbeat time of the data block group sent by the master device; the heartbeat time is the access state information of the data block group acquired by the master device, and is determined according to the access state information of the data block group.
  • the level determination module 164p determines the importance level of the data block group based on the access status information of the data block group.
  • the correction module 164q corrects the heartbeat time based on the importance level of the data block group.
  • the time return module 164r returns the corrected heartbeat time to the master device.
  • FIG. 2 is a flowchart of a method for a heartbeat-based data synchronization method according to an exemplary embodiment of the present invention.
  • the method may be used in a storage system in which at least one data block group is stored as shown in FIG. 1.
  • the heartbeat-based data synchronization method may include:
  • Step 201 The master device acquires access state information of the data block group.
  • the access status information of the data block group may include a read frequency and a write frequency of the data block group.
  • the read frequency of the data block group is the number of times the master device in the storage system receives an operation to read a data block in the data block group for a period of time.
  • the write frequency of the data block group is the number of times the master device in the storage system receives an operation to write a data block in the data block group for a period of time.
  • the client usually only reads and writes data blocks stored in the master device corresponding to the data block group, so the master device is in a segment.
  • the number of times the operation of reading or writing data is performed to a certain extent reflects the frequency with which a data set is used.
  • the access state information such as the read frequency and the write frequency of the data block group over a period of time can be counted.
  • the access state information of the data block group may further include working state information of each storage device that stores the data block group, for example, the access state information of the data block group may further include storing the data block group.
  • CPU Central Processing Unit
  • memory usage for example, the access state information of the data block group may further include storing the data block group.
  • I/O Input/Output
  • information such as data throughput, CPU usage, memory usage, and I/O occupancy of a storage device may reflect the frequency with which the storage device receives data synchronization commands or heartbeat signals.
  • a storage device with large data throughput, high CPU usage, high memory usage, and high I/O occupancy rate reflects that the storage device frequently receives data synchronization commands or heartbeat signals;
  • Small throughput, low CPU usage, low memory usage, and low I/O usage reflect the fact that the storage device receives less data synchronization commands or fewer heartbeat signals.
  • This step can be implemented by the processor 12 in the network device 10 shown in FIG. 1B executing the information acquisition module 164a.
  • Step 202 The master device determines a heartbeat time of the data block group according to the access state information of the data block group.
  • the embodiment of the present invention can set a separate heartbeat time for each data block group, so that the number of communication times per unit time between the master device and the slave device of the data block group with fewer read/write times is kept at a lower level.
  • Level so as to ensure the synchronization performance of the data block group, reduce the communication time per unit time between the master device and the slave device of the data block group, reduce the system overhead of the distributed storage system, and improve the reading of the storage system. Write performance.
  • the step of determining, by the master device, the heartbeat time of the data block group according to the access state information of the data block group may include the following two possible implementation manners.
  • the master device performs the scoring according to the preset first scoring rule access status information, obtains a first reference score, and determines a heartbeat time of the data block group according to the first reference score.
  • the access state information is taken as an example of the read frequency and the write frequency of the data block group.
  • the master device scores the read frequency and the write frequency of the data block group according to the preset first scoring rule, and uses the sum of the scores of the read frequency and the write frequency as the first reference score, according to the first reference score. Determine the heartbeat time of the data block group.
  • the first scoring rule may be a scoring rule pre-specified for the access status information of the data block group.
  • the scoring rule may be the rule shown in Table 1 below.
  • the access state information of the data block group determined according to the first scoring rule is summed as the first reference score.
  • the access state information of the data block group is a read frequency of 70 Hz and a write frequency of 45 Hz.
  • the first reference score is a score of 70 Hz corresponding to the read frequency of 70 Hz and a write frequency of 45 Hz. The corresponding 1 point sum, the first reference score is 3 points.
  • the master device determines the heartbeat time of the data block group based on the first reference score of 3 points.
  • the manner of determining may be determined according to the size of the first reference score, and one of the possible implementation manners may be referred to Table 2.
  • the master device finds the heartbeat time corresponding to the first reference score of 3 points for 5 ms, and determines 5 ms as the heartbeat time of the data block group.
  • Table 1 does not form a limitation on the first scoring rule, and Table 1 is only one of the possible implementations of the first scoring rule.
  • the access status information "read/write frequency" in the second column of Table 1 indicates that the reading frequency and the writing frequency correspond to the same scoring rule. In practical applications, the scoring rules of the reading frequency and the writing frequency may be different.
  • the foregoing Table 2 is only a possible implementation manner of the first reference score and the heartbeat time.
  • the embodiment does not limit the corresponding manner of the first reference score and the heartbeat time.
  • the master device calculates the heartbeat time of the data block group according to the access state information of the data block group, the reference time, and the first weight. First, the master device acquires a preset reference time and a first weight; secondly, the master device calculates a heartbeat time of the data block group according to the access state information, the reference time, and the first weight.
  • the reference time is a value obtained by machine learning or a preset value.
  • the first weight includes weights corresponding to at least two pieces of information respectively.
  • each access state information may be multiplied by a corresponding weight, or each access state information may be subjected to predetermined processing and multiplied by respective weights, and the processing manner may be a scoring method or the like.
  • the processing is performed by the first scoring rule, and the obtained products corresponding to the respective access state information are added to obtain a sum, and then the respective reference state information is respectively divided by the preset reference time. The sum of the products is summed to obtain the corresponding heartbeat time.
  • the heartbeat time can be calculated according to the following formula:
  • heartbeatTime is the heartbeat time
  • Time is the reference time
  • R is the read frequency value
  • W is the write frequency value
  • weightR is the weight corresponding to the read frequency
  • weightW is the weight corresponding to the write frequency.
  • the reference time is 120ms
  • the reading frequency is 70Hz
  • the corresponding weight is 0.2
  • the writing frequency is 45Hz
  • the corresponding weight is 0.8.
  • the access status information of the data block group includes the following five data:
  • the reading frequency is 70 Hz
  • the writing frequency is 45 Hz
  • the data throughput is 900 KB/s
  • the CPU usage is 40%
  • the memory usage is 50%.
  • the weights of the above five data are: 0.05, 0.05, 0.1, 0.2, and 0.4, according to the above calculation.
  • the method obtains a heartbeat time of 12 ms/(70*0.05+45*0.05+0.9*01+0.4*0.2+0.5*0.4), which is approximately equal to 2 ms.
  • the method for calculating the heartbeat time of the data block group according to the access state information, the reference time, and the first weight of the data block group is taken as an example by using the two specific implementation scenarios. Note that the type and calculation formula of the access status information are not limited.
  • This step can be implemented by the processor 12 in the network device 10 shown in FIG. 1B executing the heartbeat time determination module 164b.
  • Step 203 The master device sends the heartbeat time of the data block group to the slave device.
  • This step can be implemented by the processor 12 in the network device 10 shown in FIG. 1B executing the first time transmitting module 164d.
  • Step 204 The slave device receives the heartbeat time sent by the master device.
  • This step can be implemented by the processor 12 in the network device 10 shown in FIG. 1B executing the second time receiving module 164j.
  • Step 205 The slave device sets an election timeout period according to the heartbeat time.
  • This step can be implemented by the processor 12 in the network device 10 shown in FIG. 1B executing the setup module 164k.
  • Step 206 When the slave device does not receive any signal sent by the master device within the election timeout period, initiate a master device election to each of the plurality of storage devices.
  • the system divides the storage device into three roles: Leader, follower and Candidate according to the election mode.
  • the main function in this scheme The device is equivalent to the Leader, and the slave device is equivalent to the Follower.
  • Leader is responsible for synchronous management of logs (ie, data consistency management), handling access from the client including read and write operations, and maintaining contact by sending Heartbeat to Follower;
  • follower is responsible for responding to Leader's log synchronization request. (ie, the data synchronization instruction), responding to the voting request initiated by Candidate, and forwarding the received client's request to the Leader.
  • the election timeout period is used to trigger the slave device to initiate a leader election. Specifically, the slave device sets a timer, and the slave device resets the timer after receiving the heartbeat signal or the data synchronization command sent by the master device. When the device does not receive the heartbeat signal or the data synchronization command until the election timeout period, the default master device fails. At this time, the slave device re-initiates the election and initiates the election.
  • the device identity is changed from follower to Candidate, and a Remote Procedure Call Protocol (RPC) is used to initiate a voting request (RequestVote) to other slave devices, requesting other slave devices to support the slave device to become a leader, and if the slave device receives the More than half of the slaves voted for support, and the identity was changed to Leader.
  • RPC Remote Procedure Call Protocol
  • the slave device can set the election timeout period in the following three ways:
  • the slave device scores the access state information according to a preset second scoring rule, obtains a second reference score, and determines a first timeout coefficient of the data block group according to the second reference score, the first The product of the timeout factor and the heartbeat time is set to the election timeout period.
  • the specific method for the slave device to score the access state information according to the preset second scoring rule and obtain the second reference score, and the master device scores the access state information according to the first scoring rule and obtains the first The process of referring to the score is similar and will not be described here.
  • the slave device may pre-store the correspondence between each reference score segment and each timeout coefficient.
  • the slave device may Querying the reference score segment where the second reference score is located, and querying the first timeout coefficient corresponding to the reference score segment in the pre-stored correspondence.
  • the embodiment of the present invention does not limit the specific form of the correspondence between each reference score segment and each timeout coefficient. For example, refer to Table 3, which shows the correspondence between the reference score segment and the timeout coefficient. .
  • Reference score Timeout factor 1-2 points 70 3-4 points 60 5-6 points 50 7 or more points 40
  • the second timeout factor sets the product of the second timeout factor and the heartbeat time to the election timeout period.
  • the reference coefficient may be a value obtained by machine learning, or may be a preset value.
  • the second weight includes the weight of the access state information of the data block group.
  • the slave device may multiply the value corresponding to the access state information and the weight corresponding to the access state information, respectively, and add the obtained products to obtain a sum, and the sum obtained by adding the product is added to the reference coefficient. That is, the second timeout coefficient is obtained, and the product of the second timeout coefficient and the heartbeat time is the election timeout period.
  • the formula for the slave device to set the election timeout period based on the heartbeat time can be as follows:
  • OverTime is the election timeout period
  • heartbeatTime is the heartbeat time
  • Reference is the reference coefficient
  • R is the value of the read frequency
  • W is the value of the write frequency
  • weightR is the weight corresponding to the read frequency
  • weightW is the weight corresponding to the write frequency.
  • the access state information of the data block group includes the read frequency and the write frequency of the data block group, and is 70 Hz and 45 Hz, respectively, and the corresponding weights are 0.2 and 0.8, respectively, and the reference coefficient is 15
  • the heartbeat time is 2 ms
  • the heartbeat time is 2 ms
  • the reference coefficient is 15
  • the access state information of the data block group includes: read frequency (70 Hz), write frequency (45 Hz), and CPU usage (40%).
  • the slave device sets the product of the preset reference coefficient and the heartbeat time as the election timeout period.
  • the slave device can also directly multiply the reference coefficient by the heartbeat time sent by the master device, and use the product as the election timeout period.
  • the embodiment of the present invention only uses the foregoing specific implementation scenario as an example to describe the manner in which the slave device calculates the election timeout time according to the heartbeat time and the access state information of the data block group, and does not access the state information.
  • Types and specific calculation formulas constitute a limitation.
  • This step can be implemented by the processor 12 in the network device 10 shown in FIG. 1B executing the election initiation module 1641.
  • Step 207 The master device sends a data synchronization instruction to the slave device according to the heartbeat time of the data block group.
  • This step can be performed by the processor 12 in the network device 10 shown in FIG. 1B.
  • Module 164c is implemented.
  • Step 208 The slave device receives a data synchronization instruction sent by the master device according to a heartbeat time of the data block group.
  • This step can be implemented by the processor 12 in the network device 10 shown in FIG. 1B executing the instruction receiving module 164h.
  • Step 209 The slave device performs data synchronization according to the data synchronization instruction.
  • data synchronization can be performed by log replication between data blocks in the same data block group; in the distributed storage system of the Raft consistency algorithm, the master device receives the client. After the log (transaction request), the log is added to the local log, and then the log is synchronized to each slave device through the heartbeat. After receiving and recording the log, the slave device sends an acknowledgement response to the master device, and the master device receives the response. After more than half of the acknowledgment responses returned from the device, the log is set to be committed and appended to the local disk and the client is notified, and the next heartbeat is notified to the slave to store the log on its local disk.
  • This step can be implemented by the processor 12 in the network device 10 shown in FIG. 1B executing the data synchronization module 164i.
  • the method provided in this embodiment is not limited to the distributed storage system based on the Raft consistency algorithm, and can also be applied to other distributed storage systems based on heartbeat time for data synchronization.
  • the method for the data synchronization method based on the heartbeat obtains the access state information of the data block group by the master device, and determines the heartbeat time of the data block group according to the access state information of the data block group.
  • the data synchronization instruction is sent to the slave device to instruct the slave device to perform data synchronization, and the existing Raft consistency algorithm needs to frequently send and receive between the storage devices where the data block group is located.
  • the signal causes a large system overhead, affecting the read and write performance of the storage system, and achieves the effect of reducing the system overhead of the distributed storage system and improving the read and write performance of the storage system.
  • FIG. 3 is a flowchart of a method for a heartbeat-based data synchronization method according to an exemplary embodiment of the present invention. The method is used for the method including the coordination device shown in FIG. 1 and storing at least one data. Block group in the storage system. As shown in FIG. 3, the heartbeat-based data synchronization method may include:
  • step 301 the coordination device counts access state information of at least one data block group.
  • the access state information of the at least one data block group may be used by the at least one data block group.
  • the master device is sent to the coordinating device.
  • This step can be implemented by the processor 12 in the network device 10 shown in FIG. 1B executing the statistics module 164m.
  • step 302 the coordination device determines the importance level of the data block group according to the access state information of the data block group.
  • the coordination device may determine the importance level of the data block group according to the sum of the read frequency and the write frequency in the data block group access state information. For example, refer to Table 4, which lists each data block in a certain storage system. The reading frequency of the group, the frequency of writing, and the ranking of the importance of each data block group.
  • the coordination device calculates the sum of the read and write frequencies of the four data block groups, and determines the importance of the four data block groups according to the sum of the read and write frequencies of the four data block groups. The greater the sum of the reading and writing frequencies of the group, the higher the importance, and the higher the corresponding importance ranking.
  • the coordination device may also obtain a third weight, where the third weight includes a weight corresponding to each of the read frequency and the write frequency, and the coordination device determines each data block group according to the read frequency, the write frequency, and the third weight corresponding to each data block group. The importance level.
  • This step can be implemented by the processor 12 in the network device 10 shown in FIG. 1B executing the level determination module 164p.
  • Step 303 The master device of a data block group sends the heartbeat time of the data block group to the coordination device.
  • the heartbeat time is a heartbeat time that the master device acquires access state information of the data block group and determines the access state information according to the access state information of the data block group, and the access state information may include a read frequency and a write of the data block group. At least two pieces of information, including the frequency.
  • the master device determines the access state information according to the data block group For the step of the heartbeat time, reference may be made to the description in the corresponding embodiment of FIG. 2, and details are not described herein again.
  • This step can be implemented by the processor 12 in the network device 10 shown in FIG. 1B executing the second time transmitting module 164e.
  • Step 304 The coordinating device receives a heartbeat time of the data block group sent by the master device.
  • This step can be implemented by the processor 12 in the network device 10 shown in FIG. 1B executing the third time receiving module 164n.
  • step 305 the coordination device corrects the heartbeat time of the data block group according to the importance level of the data block group.
  • the method for correcting the heartbeat time may be as follows: first, the correction coefficient of the data block group is determined according to the importance level of the data block group, and then the product of the correction coefficient and the heartbeat time is determined as the corrected heartbeat time. .
  • the correction coefficients of the five data block groups with high importance to low importance are 0.7, 0.8, 1.0, 1.2, and 1.5, respectively, and the heartbeat time is 2 ms.
  • the corrected heartbeat time of the five data blocks is 1.4 respectively. Ms, 1.6ms, 2ms, 2.4ms and 3.0ms.
  • This step can be implemented by the processor 12 in the network device 10 shown in FIG. 1B executing the correction module 164q.
  • step 306 the coordination device returns the corrected heartbeat time to the master device, and the master device sends a data synchronization instruction to the slave device according to the corrected heartbeat time.
  • This step can be implemented by the processor 12 in the network device 10 shown in FIG. 1B executing the time return module 164r.
  • the master device of the data block group after generating the heartbeat time, sends the heartbeat time to the coordination device for correction, and the master device generates the heartbeat time according to the corrected heartbeat time returned by the coordination device.
  • the heartbeat time is performed by the data synchronization command or the heartbeat signal.
  • the master device After receiving the corrected heartbeat time returned by the coordination device, the master device performs the data synchronization command or the heartbeat signal transmission according to the corrected heartbeat time. Further, the master device sends the corrected heartbeat time to the slave device of the data block group, so that the slave device of the data block group resets the election timeout period according to the corrected heartbeat time.
  • the heartbeat-based data synchronization method determines the importance of the data block group according to the read access state information of the data block group by coordinating the access state information of the device statistical data block group.
  • Level receiving a heartbeat time of the data block group sent by the master device of a data block group, correcting the heartbeat time of the data block group according to the importance level of the data block group, and correcting
  • the heartbeat time of each data block group is corrected according to the respective operating parameters of each data block group, and the heartbeat time of each data block group of the storage system is optimized as a whole to further improve the reading of the storage system.
  • Write performance is performed by the master device of a data block group.
  • FIG. 4 is a structural block diagram of a heartbeat-based data synchronization apparatus according to an embodiment of the present invention.
  • the heartbeat-based data synchronization apparatus can be implemented as a part of a storage device of a distributed storage system by software, hardware, or a combination of both. Or all.
  • the distributed storage system may be a distributed storage system as shown in FIG. 1A, the storage device is a master device of a data block group, and the data block group includes a plurality of data blocks, and the plurality of data blocks are respectively stored in the distributed storage.
  • the plurality of storage devices include the primary device, and the remaining devices of the plurality of storage devices except the primary device are slave devices of the data block group, and the heartbeat-based data synchronization device
  • the information acquisition unit 401, the heartbeat time determination unit 402, the first instruction transmission unit 403, the first time transmission unit 404 and the second time transmission unit 405, the first time reception unit 406, and the second instruction transmission unit 407 may be included.
  • the information acquisition unit 401 has the same or similar function as the information acquisition module 164a.
  • the heartbeat time determination unit 402 has the same or similar function as the heartbeat time determination module 164b.
  • the first instruction transmitting unit 403 has the same or similar function as the first instruction transmitting module 164c.
  • the first time transmitting unit 404 has the same or similar function as the first time transmitting module 164d.
  • the second time transmitting unit 405 has the same or similar function as the second time transmitting module 164e.
  • the first time receiving unit 406 has the same or similar function as the first time receiving module 164f.
  • the second instruction transmitting unit 407 has the same or similar function as the second instruction transmitting module 164g.
  • FIG. 5 is a structural block diagram of a heartbeat-based data synchronization apparatus according to an embodiment of the present invention.
  • the heartbeat-based data synchronization apparatus can be implemented by software, hardware, or a combination of the two. Some or all of the storage devices of the storage system.
  • the distributed storage system may be a distributed storage system shown in FIG. 1A, the storage device is a slave device of a data block group, and the data block group includes a plurality of data blocks, and the plurality of data blocks are respectively stored in the distributed storage.
  • the plurality of storage devices include the primary device, and the remaining devices of the plurality of storage devices except the primary device are slave devices of the data block group, and the heartbeat-based data synchronization device
  • the method may include an instruction receiving unit 501, a data synchronization unit 502, a second time receiving unit 503, a setting unit 504, and an election initiating unit 505.
  • the instruction receiving unit 501 has the same or similar function as the instruction receiving module 164h.
  • the data synchronization unit 502 has the same or similar functionality as the data synchronization module 164i.
  • the second time receiving unit 503 has the same or similar function as the second time receiving module 164j.
  • the setting unit 504 has the same or similar function as the setting module 164k.
  • the election initiation unit 505 has the same or similar function as the election initiation module 1641.
  • FIG. 6 is a structural block diagram of a heartbeat-based data synchronization apparatus according to an embodiment of the present invention.
  • the heartbeat-based data synchronization apparatus can be implemented as part of a coordination device of a distributed storage system by software, hardware, or a combination of both. Or all.
  • the distributed storage system may be a distributed storage system shown in FIG.
  • the distributed storage system stores at least one data block group
  • one data block group includes a plurality of data blocks
  • the plurality of data blocks are respectively stored in the distributed
  • the plurality of storage devices of the storage system include the primary device, and the remaining devices of the plurality of storage devices except the primary device are slave devices of the data block group
  • the heartbeat-based data synchronization may include a statistics unit 601, a third time receiving unit 602, a level determining unit 603, a correcting unit 604, and a time returning unit 605.
  • the statistical unit 601 has the same or similar function as the statistical module 164m.
  • the third time receiving unit 602 has the same or similar function as the third time receiving module 164n.
  • the rank determining unit 603 has the same or similar function as the rank determining module 164p.
  • Correction unit 604 has the same or similar functionality as correction module 164q.
  • the time return unit 605 has the same or similar function as the time return module 164r.
  • a person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium.
  • the storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Cardiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computer Security & Cryptography (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种基于心跳的数据同步方法,属于数据同步技术领域。所述方法用于分布式存储系统中,所述分布式存储系统存储有至少一个数据块组,且所述分布式存储系统存储包括多个存储设备;所述多个存储设备中的一个设备为存储所述数据块组的主设备,其余设备为存储所述数据块组的从设备,所述主设备执行所述方法,所述主设备获取所述数据块组的存取状态信息,根据所述数据块组的存取状态信息确定所述数据块组的心跳时间,根据所述数据块组的心跳时间向所述从设备发送数据同步指令,所述数据同步指令用于指示所述从设备进行数据同步,达到降低分布式存储系统的系统开销,提高存储系统的读写性能的效果。

Description

基于心跳的数据同步装置、方法及分布式存储系统 技术领域
本发明涉及数据同步技术领域,特别涉及一种基于心跳的数据同步装置、方法及分布式存储系统。
背景技术
在数据同步技术领域中,Raft一致性算法广泛应用在分布式存储系统中。
目前,在基于Raft一致性算法的分布式存储系统中的数据分为若干个数据块组,每个数据块组由分别存储在不同存储设备中的多个相同数据块组成,在实现Raft一致性算法时,以数据块组为单位进行数据同步。具体的,对于一个数据块组,该数据块组的各个数据块所在的存储设备首先通过选举的方式产生该数据块组的一个主设备,相应的,其它设备为该数据块组的从设备,其中,主设备负责处理与客户端之间的交互。当主设备接收到客户端发送的读写指令时,将读写指令记录为日志,并按照固定的心跳时间向各个从设备发送包含该日志的数据同步指令,由各个从设备根据该数据同步指令进行数据同步;若在心跳时间达到时没有要发送的日志,则主设备需要向各个从设备发送不包含数据同步指令的心跳信号,以确定连接正常。
在实现本发明的过程中,发明人发现现有技术至少存在以下问题:
现有的Raft一致性算法中,各个数据块组的心跳时间都是固定值,而各个数据块组的读写频率却并不均衡,为了保证读写频率较高的数据块组中各个数据块之间能够及时同步,该固定值通常设置的较小,从而导致读写频率较低的数据块组所在的各个存储设备之间也需要频繁的收发信号,导致系统开销较大,影响存储系统的读写性能。
发明内容
为了解决现有技术中读写频率较低的数据块组所在的各个存储设备之间也需要频繁的收发的信号,导致系统开销较大,影响存储系统的读写性能的问题,本发明实施例提供了一种装置、方法及分布式存储系统。所述技术方案如下:
分布式存储系统存储有至少一个数据块组,且该分布式存储系统包括多个存储设备;该多个存储设备中的一个设备为存储该数据块组的主设备,其余设备为存储该数据块组的从设备。该分布式存储系统还可以包括协调设备,该协调设备与分布式存储系统中的各个存储设备相连。
第一方面,提供了一种基于心跳的数据同步方法,该方法包括:
主设备获取该数据块组的存取状态信息;该主设备根据该数据块组的存取状态信息确定该数据块组的心跳时间;该主设备根据该数据块组的心跳时间向该从设备发送数据同步指令,该数据同步指令用于指示该从设备进行数据同步。
本发明实施例所示的方案,对于每一个数据块组,根据该数据块组的读写频率等相关信息自适应的确定数据块组的心跳时间,解决了现有的Raft一致性算法中,一个数据块组所在的各个存储设备之间需要频繁的收发的信号,导致系统开销较大,影响存储系统的读写性能的问题;达到了降低分布式存储系统的系统开销,提高存储系统的读写性能的效果。
在第一方面的第一种可能实现方式中,数据块组的存取状态信息包括数据块组的读频率和写频率。
结合第一方面或者第一方面的第一种可能实现方式,在第一方面的第二种可能实现方式中,该主设备根据该数据块组的存取状态信息确定该数据块组的心跳时间,包括:该主设备根据预设的第一评分规则对存取状态信息进行评分,获得第一参考分值;该主设备根据该第一参考分值确定该数据块组的心跳时间。提供一种根据包括读频率和写频率在内的至少两个信息确定心跳时间的方式,自适应的确定数据块组的心跳时间。
结合第一方面的第一种可能实现方式,在第一方面的第三种可能实现方式中,该主设备根据该数据块组的存取状态信息确定该数据块组的心跳时间,包括:该主设备根据该第一评分规则对该读频率和该写频率分别进行评分;该主设备将对该读频率和该写频率的评分的和作为第一参考分值;该主设备根据该第一参考分值确定该数据块组的心跳时间。
结合第一方面的第一种可能实现方式,在第一方面的第四种可能实现方式中,该主设备根据该数据块组的存取状态信息确定该数据块组的心跳时间,包括:该主设备获取预先设置的参考时间以及该存取状态信息对应的第一权重;该主设备根据该存取状态信息、该参考时间以及该第一权重计算该数据块组的心跳时间。提供一种根据包括读频率和写频率在内的至少两个信息确定心跳时 间的方式,自适应的确定数据块组的心跳时间。
结合第一方面的第四种可能实现方式,在第一方面的第五种可能实现方式中,该主设备根据该数据块组的存取状态信息确定该数据块组的心跳时间,包括:
通过以下公式计算该心跳时间:
heartbeatTime=Time/(weightR*R+weightW*W);weightR+weightW=1;
其中,heartbeatTime为该心跳时间,Time为该参考时间,R为该读频率的数值,W为该写频率的数值,weightR为该读频率对应的权重,weightW为该写频率对应的权重。
本发明实施例所示的方案,主设备根据该数据块组的存取状态信息确定该数据块组的心跳时间,综合考虑了不同存取状态信息对整体存储系统的影响,从实际的数据存取情况出发,更加合理地配置心跳时间,解决了现有的多Raft系统中,一个数据块组所在的各个存储设备之间短时内频繁的收发的信号造成的“心跳风暴”,导致系统开销较大,影响存储系统的读写性能的问题。
结合第一方面或者第一方面的第一至五种可能实现方式中的任意一种,在第一方面的第六种可能实现方式中,该方法还包括:该主设备将该数据块组的心跳时间发送给该从设备,使得该从设备根据该心跳时间设置选举超时时间。
本发明实施例所示的方案,主设备将该数据块组的心跳时间发送给该从设备,使得该从设备根据该心跳时间设置选举超时时间,考虑了心跳时间对选举超时时间的影响,选举出最有利于系统整体性能发挥的主设备主导数据的同步,提高了存储系统的整体读写性能。
结合第一方面或者第一方面的第一至六种可能实现方式中的任意一种,在第一方面的第七种可能实现方式中,该分布式存储系统还包含与所述多个存储设备相连接的协调设备,该方法还包括:该主设备将该数据块组的心跳时间发送给该协调设备;该主设备接收该协调设备返回的、修正后的心跳时间;该主设备根据该修正后的心跳时间向该从设备发送数据同步指令。
本发明实施例所示的方案,根据各个数据块组各自的读写频率对各个数据块组的心跳时间进行修正,从整体上对存储系统各个数据块组的心跳时间进行优化,进一步提高存储系统的读写性能。
第二方面,提供了一种基于心跳的数据同步方法,该方法包括:从设备接 收该主设备根据该数据块组的心跳时间发送的数据同步指令;该心跳时间是由该主设备获取该数据块组的存取状态信息,并根据该数据块组的存取状态信息确定;该从设备根据该数据同步指令进行数据同步。
本发明实施例所示的方案,根据该数据块组的读写频率等相关信息自适应的确定数据块组的心跳时间,根据确定的心跳时间进行设备间的数据同步,解决了现有的Raft一致性算法中,一个数据块组所在的各个存储设备之间需要频繁的收发的信号,导致系统开销较大,影响存储系统的读写性能的问题;达到了降低分布式存储系统的系统开销,提高存储系统的读写性能的效果。
在第二方面的第一种可能实现方式中,该数据块组的存取状态信息包括该数据块组的读频率和写频率。
结合第二方面的第一种可能实现方式,在第二方面的第二种可能实现方式中,该方法还包括:该从设备接收该主设备发送的该心跳时间;该从设备根据该心跳时间设置选举超时时间;当该从设备在该选举超时时间内未接收到该主设备发送的任何信号时,该从设备向该多个存储设备中的其它各个存储设备发起主设备选举。
本发明实施例所示的方案,从设备根据该心跳时间设置选举超时时间,考虑了心跳时间对选举超时时间的影响,选举出最有利于系统整体性能发挥的主设备主导数据的同步,提高了存储系统的整体读写性能。
结合第二方面的第二种可能实现方式,在第二方面的第三种可能实现方式中,该从设备根据该心跳时间设置选举超时时间,包括:该从设备根据预设的第二评分规则对该存取状态信息进行评分,获得第二参考分值;该从设备根据该第二参考分值确定该数据块组的第一超时系数;该从设备将该第一超时系数与该心跳时间的乘积设置为该选举超时时间。
结合第二方面的第二种可能实现方式,在第二方面的第四种可能实现方式中,该从设备根据该心跳时间设置选举超时时间,包括:该从设备获取预先设置的参考系数以及所述存取状态信息对应的第二权重;该从设备根据该存取状态信息、该参考系数以及该第二权重计算该数据块组的第二超时系数;该从设备将该第二超时系数与该心跳时间的乘积设置为该选举超时时间。
结合第二方面的第四种可能实现方式,在第二方面的第五种可能实现方式中,该从设备根据该心跳时间设置选举超时时间,包括:
通过以下公式设置该选举超时时间:
OverTime=(weightR*R+weightW*W+Reference)*heartbeatTime;
weightR+weightW=1;
其中,OverTime为该选举超时时间,heartbeatTime为该心跳时间,Reference为该参考系数,R为该读频率的数值,W为该写频率的数值,weightR为该读频率对应的权重,weightW为该写频率对应的权重。
本发明实施例所示的方案,主设备根据该数据块组的存取状态信息确定该数据块组的心跳时间,根据该心跳时间设置选举超时时间,综合考虑了不同存取状态信息对整体存储系统的影响,从实际的数据存取情况出发,更加合理地配置选举时间,提高了存储系统整体的读写性能。
第三方面,提供了一种基于心跳的数据同步方法,该方法包括:协调设备统计该数据块组的存取状态信息,并接收该主设备发送的、该数据块组的心跳时间;该心跳时间是由该主设备获取该数据块组的存取状态信息,并根据该数据块组的存取状态信息确定;该协调设备根据该数据块组的存取状态信息确定该数据块组的重要性等级;该协调设备根据该数据块组的重要性等级对该心跳时间进行修正;该协调设备将修正后的心跳时间返回给该主设备。
本发明实施例所示的方案,协调设备根据该数据块组的存取状态信息确定该数据块组的重要性等级,并根据该数据块组的重要性等级对该心跳时间进行修正。从整体系统的角度,通过对心跳时间的统筹调整,更加合理地分配资源,提高了存储系统整体的读写性能。
第四方面,提供了一种网络设备,该网络设备包括:处理器、网络接口、存储器以及总线,存储器与网络接口分别通过总线与处理器相连;处理器被配置为执行存储器中存储的指令;处理器通过执行指令来实现上述第一方面或第一方面中任意一种可能的实现方式所提供的基于心跳的数据同步方法。
第五方面,提供了一种网络设备,该网络设备包括:处理器、网络接口、存储器以及总线,存储器与网络接口分别通过总线与处理器相连;处理器被配置为执行存储器中存储的指令;处理器通过执行指令来实现上述第二方面或第二方面中任意一种可能的实现方式所提供的基于心跳的数据同步方法。
第六方面,提供了一种网络设备,该网络设备包括:处理器、网络接口、存储器以及总线,存储器与网络接口分别通过总线与处理器相连;处理器被配置为执行存储器中存储的指令;处理器通过执行指令来实现上述第三方面所提供的基于心跳的数据同步方法。
第七方面,本发明实施例提供了一种基于心跳的数据同步装置,该界面交互装置包括至少一个单元,该至少一个单元用于实现上述第一方面或第一方面中任意一种可能的实现方式所提供的基于心跳的数据同步方法。
第八方面,本发明实施例提供了一种基于心跳的数据同步装置,该界面交互装置包括至少一个单元,该至少一个单元用于实现上述第二方面或第二方面中任意一种可能的实现方式所提供的基于心跳的数据同步方法。
第九方面,本发明实施例提供了一种基于心跳的数据同步装置,该界面交互装置包括至少一个单元,该至少一个单元用于实现上述第三方面所提供的基于心跳的数据同步方法。
上述本发明实施例第四到第九方面所获得的技术效果与第一到第三方面中对应的技术手段获得的技术效果近似,在这里不再赘述。
综上所述,本发明实施例提供的技术方案带来的有益效果是:
通过主设备获取数据块组的存取状态信息;主设备根据数据块组的存取状态信息确定数据块组的心跳时间;主设备根据数据块组的心跳时间向所从设备发送数据同步指令,数据同步指令用于指示从设备进行数据同步,解决了现有的Raft一致性算法中,一个数据块组所在的各个存储设备之间需要频繁的收发的信号,导致系统开销较大,影响存储系统的读写性能的问题;达到了降低分布式存储系统的系统开销,提高存储系统的读写性能的效果。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明 的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1A是本发明一示例性实施例提供的分布式存储系统的结构示意图;
图1B是图1A所示实施例涉及的一种网络设备的结构示意图;
图1C是图1A所示实施例涉及的一种应用程序模块的示意图;
图1D是图1A所示实施例涉及的另一种应用程序模块的示意图;
图1E是图1A所示实施例涉及的又一种应用程序模块的示意图;
图2是本发明一示例性实施例提供的一种基于心跳的数据同步方法的方法流程图;
图3是本发明一示例性实施例提供的一种基于心跳的数据同步方法的方法流程图;
图4是本发明一示例性实施例提供的一种基于心跳的数据同步装置的结构方框图;
图5是本发明一示例性实施例提供的一种基于心跳的数据同步装置的结构方框图;
图6是本发明一示例性实施例提供的一种基于心跳的数据同步装置的结构方框图。
具体实施方式
为使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明实施方式作进一步地详细描述。
请参考图1A,其示出了本发明一个示例性实施例提供的分布式存储系统的结构示意图。该分布式存储系统包括:若干个存储设备110。
该分布式存储系统中存储有若干个数据块组,每个数据块组包含多块相同的数据块,且同一数据块组中的多个数据块存储于不同的存储设备中。在存储一个数据块组的多个存储设备中,有一个存储设备为存储该数据块组的主设备,其余的存储设备为存储该数据块组的从设备,主设备负责处理与客户端之间的交互。
在本发明实施例中,一个数据块组的主设备可以获取该数据块组的存取状态信息,根据该数据块组的存取状态信息确定该数据块组的心跳时间,根据该 数据块组的心跳时间向该从设备发送数据同步指令;该数据块组的从设备,用于根据该数据同步指令进行数据同步,以使得该多个数据块保持一致。
可选的,该分布式存储系统还包括:协调设备120。协调设备120通过有线或者无线网络与若干个存储设备110相连并进行通信。
请参考图1B,其示出了本发明示例性实施例涉及的一种网络设备的结构示意图。该网络设备10可以是上述存储设备110或者协调设备120,该网络设备10包括:处理器12和网络接口14。
处理器12包括一个或者一个以上处理核心。处理器12通过运行软件程序以及模块,从而执行各种功能应用以及数据处理。
网络接口14可以为多个,该网络接口14用于与其它存储设备或者网络设备进行通信。
可选的,网络设备10还包括存储器16、总线18等部件。其中,存储器16与网络接口14分别通过总线18与处理器12相连。
存储器16可用于存储软件程序以及模块。具体的,存储器16可存储操作系统162、至少一个功能所需的应用程序模块164。操作系统162可以是实时操作系统(Real Time eXecutive,RTX)、LINUX、UNIX、WINDOWS或OS X之类的操作系统。
请参考图1C,其示出了本发明示例性实施例涉及的一种应用程序模块的示意图。如图1C所示,当网络设备10是一个存储设备,且该存储设备是一个数据块组的主设备时,该应用程序模块164可以是信息获取模块164a、心跳时间确定模块164b、第一指令发送模块164c、第一时间发送模块164d、第二时间发送模块164e、第一时间接收模块164f和第二指令发送模块164g。
信息获取模块164a获取数据块组的存取状态信息。
心跳时间确定模块164b根据数据块组的存取状态信息确定数据块组的心跳时间。
第一指令发送模块164c根据数据块组的心跳时间向从设备发送数据同步指令。
第一时间发送模块164d将数据块组的心跳时间发送给从设备。
第二时间发送模块164e将数据块组的心跳时间发送给协调设备。
第一时间接收模块164f接收协调设备返回的、修正后的心跳时间。
第二指令发送模块164g根据修正后的心跳时间向从设备发送数据同步指 令。
请参考图1D,其示出了本发明示例性实施例涉及的另一种应用程序模块的示意图。当网络设备10是上述数据块组的从设备时,该应用程序模块164可以是指令接收模块164h、数据同步模块164i、第二时间接收模块164j、设置模块164k和选举发起模块164l。
指令接收模块164h接收主设备根据数据块组的心跳时间发送的数据同步指令;该心跳时间是由主设备获取数据块组的存取状态信息,并根据数据块组的存取状态信息确定。
数据同步模块164i根据数据同步指令进行数据同步,以使得多个数据块保持一致。
第二时间接收模块164j接收主设备发送的心跳时间。
设置模块164k根据心跳时间设置选举超时时间。
选举发起模块164l在从设备在选举超时时间内未接收到主设备发送的任何信号时,向该多个存储设备中的其它各个存储设备发起主设备选举。
请参考图1E,其示出了本发明示例性实施例涉及的又一种应用程序模块的示意图。当网络设备10是上述协调设备120时,该应用程序模块164可以是统计模块164m、第三时间接收模块164n、等级确定模块164p、修正模块164q和时间返回模块164r。
统计模块164m统计数据块组的存取状态信息。
第三时间接收模块164n接收主设备发送的、数据块组的心跳时间;该心跳时间是由主设备获取数据块组的存取状态信息,并根据该数据块组的存取状态信息确定。
等级确定模块164p根据数据块组的存取状态信息确定该数据块组的重要性等级。
修正模块164q根据数据块组的重要性等级对心跳时间进行修正。
时间返回模块164r将修正后的心跳时间返回给主设备。
图2是本发明一示例性实施例提供的一种基于心跳的数据同步方法的方法流程图,该方法可以用于如图1所示的存储有至少一个数据块组的存储系统中。如图2所示,该基于心跳的数据同步方法可以包括:
步骤201,主设备获取数据块组的存取状态信息。
其中,该数据块组的存取状态信息可以包括该数据块组的读频率和写频率。
该数据块组的读频率是存储系统中的主设备在一段时间内接受到对该数据块组中的数据块进行读取的操作的次数。该数据块组的写频率是存储系统中的主设备在一段时间内接受到对该数据块组中的数据块进行写入的操作的次数。
以基于Raft一致性算法的分布式存储系统为例,对于某一数据块组,客户端通常只对存储在该数据块组对应的主设备中的数据块进行读写操作,故主设备在一段时间内被读取或者写入数据的操作的次数在一定程度上,反映了一个数据组被使用的频繁程度。当一个存储设备成为一个数据块组的主设备之后,即可以统计该数据块组在一段时间内的读频率和写频率等存取状态信息。
可选的,该数据块组的存取状态信息还可以包括存储该数据块组的各个存储设备的工作状态信息,比如,该数据块组的存取状态信息还可以包括存储该数据块组的各个存储设备的数据吞吐量、CPU(Central Processing Unit,中央处理器)使用率、内存使用率和I/O(Input/Output,输入/输出)占用率等。
在分布式存储系统中,一个存储设备的数据吞吐量、CPU使用率、内存使用率和I/O占用率等信息可以反映该存储设备接收数据同步指令或者心跳信号的频繁程度。例如,一个存储设备的数据吞吐量大、CPU使用率高、内存使用率高和I/O占用率高,则反映了该存储设备频繁接收数据同步指令或者心跳信号;反之,一个存储设备的数据吞吐量小、CPU使用率低、内存使用率低和I/O占用率低,则反映了该存储设备接收数据同步指令或者心跳信号较少。
该步骤可以由图1B所示的网络设备10中的处理器12执行信息获取模块164a来实现。
步骤202,主设备根据数据块组的存取状态信息确定该数据块组的心跳时间。
其中,一个数据块组对应的心跳时间越长,分布式存储系统中该数据块组的主设备和从设备之间单位时间通信次数越少;反之,心跳时间越短,该数据块组的主设备和从设备之间单位时间内通信次数越多。对于一个读写频率较低的数据块组,其主设备和从设备之间单位时间内进行数据同步的次数保持在一个较低的水平即可以满足该数据块组中的各个数据块的同步性能要求,因此,本发明实施例可以通过为各个数据块组分别设置独立的心跳时间,使读写次数较少的数据块组的主设备和从设备之间单位时间通信次数保持在一个较低的 水平,从而在保证数据块组的同步性能的同时,尽可能的减少数据块组的主设备和从设备之间单位时间内的通信次数,降低分布式存储系统的系统开销,提高存储系统的读写性能。
可选的,主设备根据数据块组的存取状态信息确定该数据块组的心跳时间的步骤,可以包括以下两种可能实现的方式。
1)主设备根据预设的第一评分规则存取状态信息进行评分,获得第一参考分值,根据该第一参考分值确定该数据块组的心跳时间。
在上述可能实现的方式中,以存取状态信息为数据块组的读频率和写频率为例,进行举例说明。首先,主设备根据预设的第一评分规则对数据块组的读频率和写频率进行评分,并将对读频率和写频率的评分的和作为第一参考分值,根据第一参考分值确定数据块组的心跳时间。
其中,第一评分规则可以是针对数据块组的存取状态信息预先指定的一个打分规则,例如,针对上述的读频率和写频率来说,打分规则可以是如下表1所示的规则。
表1
第一评分规则 读/写频率 数据吞吐量 CPU使用率 内存使用率 I/O占用率
1分 小于50Hz 小于800KB/s 小于20% 小于35% 小于30%
2分 50-100Hz 800-2000KB/s 20%-80% 35%-70% 30%-70%
3分 大于100Hz 大于2000KB/s 大于80% 大于70% 大于70%
根据上述第一评分规则确定的数据块组的存取状态信息求和,作为第一参考分值。例如,数据块组的存取状态信息为读频率70Hz以及写频率45Hz,若按照上述表格中表示的第一评分规则进行评分,第一参考分值为读频率70Hz对应的2分与写频率45Hz对应的1分之和,第一参考分值为3分。
其次,主设备根据第一参考分值3分确定数据块组的心跳时间。确定的方式可以根据第一参考分值的大小来确定,其中一种可能实现的确定方式可以参见表2。
表2
第一参考分值 心跳时间
1-3分 5ms
4-7分 2ms
7分以上 1ms
主设备根据表2提供的第一参考分值与心跳时间的对应关系,找到第一参考分值3分对应的心跳时间5ms,将5ms确定为数据块组的心跳时间。
需要特别说明的是,上述表1并不对第一评分规则形成限定,表1仅为第一评分规则的其中一种可能实现的方式。另外,表1中第二列中的存取状态信息“读/写频率”,表示读频率和写频率对应的评分规则相同,在实际应用中,读频率和写频率的评分规则也可以不同。
类似的,上述表2仅为第一参考分值与心跳时间的一种可能实现的对应方式,本实施例不限定第一参考分值与心跳时间的对应方式。
2)主设备根据数据块组的存取状态信息、参考时间以及第一权重计算数据块组的心跳时间。首先,主设备获取预先设置的参考时间以及第一权重;其次,主设备根据存取状态信息、该参考时间以及该第一权重计算该数据块组的心跳时间。
在上述可能实现的方式中,参考时间是一个通过机器学习获取的数值或者预先设定的数值。第一权重包含至少两个信息分别对应的权重。在计算心跳时间时,可以将各个存取状态信息与各自对应的权重相乘,或者,将各个存取状态信息进行预定处理后与各自对应的权重相乘,处理的方式可以是打分等方式,例如,通过上述第一评分规则进行处理,将得到的各个存取状态信息分别对应的乘积相加,得到一个和,之后,用预先设定好的参考时间除以上述各个存取状态信息分别对应的乘积相加得到的和,获得对应的心跳时间。
比如,以各个存取状态信息为读频率和写频率为例,在计算心跳时间时,可以根据下述公式计算心跳时间:
heartbeatTime=Time/(weightR*R+weightW*W);weightR+weightW=1。
在上述公式中,heartbeatTime为心跳时间,Time为参考时间,R为读频率数值,W为写频率数值,weightR为读频率对应的权重,weightW为写频率对应的权重。比如,以参考时间为120ms,读频率为70Hz,对应权重0.2、写频率为45Hz,对应权重0.8为例,根据上述计算公式计算获得的心跳时间为120ms/(70*0.2+45*0.8)=2.4ms。
或者,设参考时间为12ms,数据块组的存取状态信息包括以下五个数据: 读频率70Hz、写频率45Hz、数据吞吐量900KB/s、CPU使用率40%以及内存使用率50%;上述5个数据各自对应的权重为:0.05、0.05、0.1、0.2和0.4,根据上述计算方法获得心跳时间为:12ms/(70*0.05+45*0.05+0.9*01+0.4*0.2+0.5*0.4),约等于2ms。
需要特别说明的是,本实施例仅以上述两种具体的实施场景为例对主设备根据数据块组的存取状态信息、参考时间以及第一权重计算数据块组的心跳时间的方式进行举例说明,并不对存取状态信息的类型和计算公式进行限定。
该步骤可以由图1B所示的网络设备10中的处理器12执行心跳时间确定模块164b来实现。
步骤203,主设备将该数据块组的心跳时间发送给该从设备。
该步骤可以由图1B所示的网络设备10中的处理器12执行第一时间发送模块164d来实现。
步骤204,从设备接收主设备发送的该心跳时间。
该步骤可以由图1B所示的网络设备10中的处理器12执行第二时间接收模块164j来实现。
步骤205,从设备根据该心跳时间设置选举超时时间。
该步骤可以由图1B所示的网络设备10中的处理器12执行设置模块164k来实现。
步骤206,从设备在选举超时时间内未接收到主设备发送的任何信号时,向该多个存储设备中的其它各个存储设备发起主设备选举。
以基于Raft一致性算法的分布式存储系统为例,系统根据选举的方式将存储设备划分为Leader(领导者)、Follower(追随者)和Candidate(候选者)三种角色,本方案中的主设备相当于Leader、从设备相当于Follower。在这三种角色中,Leader负责日志的同步管理(即数据一致性管理)、处理来自客户端的包括读写操作在内的访问以及通过向Follower发送心跳保持联系;Follower负责响应Leader的日志同步请求(即数据同步指令)、响应Candidate发起的投票请求以及将收到的客户端的请求转发给Leader。
该选举超时时间用于触发从设备发起Leader选举,具体的,从设备中设置一定时器,从设备每次接受到主设备发送的心跳信号或者数据同步指令后,重置定时器,若该定时器计时到选举超时时间时仍未接收到心跳信号或者数据同步指令,则默认主设备发生故障,此时,从设备重新发起选举,发起选举的从 设备身份由Follower转变为Candidate,并通过远程过程调用协议(Remote Procedure Call Protocol,RPC)向其它从设备发起投票请求(RequestVote),请求其它从设备支持该从设备成为Leader,若该从设备接收到超过半数从设备的支持投票,则身份转为Leader。
其中,从设备可以通过以下三种方式设置选举超时时间:
1)从设备根据预设的第二评分规则对该存取状态信息进行评分,获得第二参考分值,根据该第二参考分值确定该数据块组的第一超时系数,将该第一超时系数与该心跳时间的乘积设置为该选举超时时间。
其中,从设备根据预设的第二评分规则对该存取状态信息进行评分并获取第二参考分值的具体方法与主设备根据第一评分规则对该存取状态信息进行评分并获取第一参考分值的过程类似,此处不再赘述。
在本发明实施例中,从设备可以预先存储各个参考分值段与各个超时系数之间的对应关系,在根据该第二参考分值确定该数据块组的第一超时系数时,从设备可以查询该第二参考分值所在的参考分值段,并在预先存储的对应关系中查询与该参考分值段对应的第一超时系数。本发明实施例对于各个参考分值段与各个超时系数之间的对应关系的具体形式不进行限制,比如,请参考表3,其示出了参考分值段与超时系数之间的对应关系表。
表3
参考分值段 超时系数
1-2分 70
3-4分 60
5-6分 50
7分以上 40
如表3所示,假设从设备根据第二评分规则对该数据块组的存取状态信息进行评分并获取第二参考分值为1.8,查询表3确定对应的第一超时系数为70,主设备发送的心跳时间为2ms,则可以设置选举超时时间为70*2ms=140ms。
2)从设备获取预先设置的参考系数以及第二权重,该第二权重包含该存取状态信息分别对应的权重,根据该存取状态信息、该参考系数以及该第二权重计算该数据块组的第二超时系数,将该第二超时系数与该心跳时间的乘积设置为该选举超时时间。
在上述可能的实现方式中,参考系数可以是一个通过机器学习获取的数值,或者,也可以是一个预先设定的数值。第二权重包含数据块组的存取状态信息的权重。从设备可以将该存取状态信息对应的数值与该存取状态信息各自对应的权重分别相乘,将得到的乘积相加,得到一个和,该乘积相加得到的和再与参考系数相加即得到第二超时系数,第二超时系数与心跳时间的乘积即为选举超时时间。从设备根据该心跳时间设置选举超时时间的公式可以如下:
OverTime=(weightR*R+weightW*W+Reference)*heartbeatTime;
weightR+weightW=1;
其中,OverTime为选举超时时间,heartbeatTime为心跳时间,Reference为参考系数,R为读频率的数值,W为写频率的数值,weightR为读频率对应的权重,weightW为写频率对应的权重。
比如,在一种可能的实现方式中,数据块组的存取状态信息包括数据块组的读频率和写频率,且分别为70Hz和45Hz,对应的权重分别为0.2和0.8,参考系数为15,心跳时间为2ms,则电子设备根据上述方案可以计算该第二超时系数为70*0.2+45*0.8+15=65,即选举超时时间为65*2ms=130ms。
或者,在另一种可能的实现方式中,心跳时间为2ms,参考系数为15,数据块组的存取状态信息包括:读频率(70Hz)、写频率(45Hz)和CPU使用率(40%);相对应的第二权重分别为:0.2、0.6和0.2,则电子设备根据上述方案可以计算该第二超时系数为70*0.2+45*0.6+40*0.2+15=64,选举超时时间为64*2ms=128ms。
3)从设备将预设的参考系数与该心跳时间的乘积设置为该选举超时时间。
从设备中也可以直接将参考系数与主设备发送的心跳时间相乘,将乘积作为选举超时时间。
类似于步骤202,本发明实施例仅以上述具体的实施场景为例对从设备根据心跳时间和数据块组的存取状态信息计算选举超时时间的方式进行举例说明,并不对存取状态信息的类型和具体的计算公式构成限定。
该步骤可以由图1B所示的网络设备10中的处理器12执行选举发起模块164l来实现。
步骤207,主设备根据该数据块组的心跳时间向该从设备发送数据同步指令。
该步骤可以由图1B所示的网络设备10中的处理器12执行第一指令发送 模块164c来实现。
步骤208,从设备接收该主设备根据该数据块组的心跳时间发送的数据同步指令。
该步骤可以由图1B所示的网络设备10中的处理器12执行指令接收模块164h来实现。
步骤209,从设备根据该数据同步指令进行数据同步。
在本发明实施例中,同一个数据块组中的各个数据块之间可以通过日志复制(Log Replication)来进行数据同步;在Raft一致性算法的分布式存储系统中,主设备当接收到客户端的日志(事务请求)后,先把该日志追加到本地的日志中,然后通过心跳将该日志同步给各个从设备,从设备接收并记录该日志后向主设备发送确认响应,当主设备收到一半以上的从设备返回的确认响应后,将该日志设置为已提交并追加到本地磁盘中并通知客户端,并在下个心跳通知从设备将该日志存储在自己的本地磁盘中。
该步骤可以由图1B所示的网络设备10中的处理器12执行数据同步模块164i来实现。
另外,本实施例提供的方法不限制使用在基于Raft一致性算法的分布式存储系统,也可以应用在其它基于心跳时间进行数据同步的分布式存储系统中。
综上所述,上述实施例提供的一种基于心跳的数据同步方法的方法,通过主设备获取数据块组的存取状态信息,根据数据块组的存取状态信息确定数据块组的心跳时间,根据数据块组的心跳时间向所从设备发送数据同步指令,指示从设备进行数据同步,解决了现有的Raft一致性算法中,一个数据块组所在的各个存储设备之间需要频繁的收发的信号,导致系统开销较大,影响存储系统的读写性能的问题,达到了降低分布式存储系统的系统开销,提高存储系统的读写性能的效果。
图3是本发明一示例性实施例提供的一种基于心跳的数据同步方法的方法流程图,该方法用于该方法可以用于如图1所示的包含协调设备,且存储有至少一个数据块组的存储系统中。如图3所示,该基于心跳的数据同步方法可以包括:
步骤301中,协调设备统计至少一个数据块组的存取状态信息。
其中,该至少一个数据块组的存取状态信息可以由该至少一个数据块组各 自的主设备发送给协调设备。
该步骤可以由图1B所示的网络设备10中的处理器12执行统计模块164m来实现。
步骤302中,协调设备根据该数据块组的存取状态信息确定该数据块组的重要性等级。
其中,协调设备可以根据数据块组存取状态信息中的读频率和写频率之和确定数据块组的重要性等级,比如,请参考表4,其列出了某一存储系统中各个数据块组的读频率、写频率以及每个数据块组的重要性的排名。
表4
Figure PCTCN2016097244-appb-000001
如表4所示,协调设备计算四个数据块组各自的读写频率之和,并按照四个数据块组各自的读写频率之和的大小确定四个数据块组的重要性,数据块组的读写频率之和越大,重要性越高,对应的重要性排名越靠前。
或者,协调设备也可以获取第三权重,该第三权重包括读频率和写频率各自对应的权重,协调设备根据各个数据块组对应的读频率、写频率以及该第三权重确定各个数据块组的重要性等级。
该步骤可以由图1B所示的网络设备10中的处理器12执行等级确定模块164p来实现。
步骤303,一个数据块组的主设备向协调设备发送该数据块组的心跳时间。
该心跳时间是该主设备获取该数据块组的存取状态信息,并根据该数据块组的存取状态信息确定的心跳时间,该存取状态信息可以包括该数据块组的读频率和写频率在内的至少两个信息。主设备根据数据块组的存取状态信息确定 的心跳时间的步骤可以参考图2对应实施例中的描述,此处不再赘述。
该步骤可以由图1B所示的网络设备10中的处理器12执行第二时间发送模块164e来实现。
步骤304,协调设备接收一个该主设备发送的、该数据块组的心跳时间。
该步骤可以由图1B所示的网络设备10中的处理器12执行第三时间接收模块164n来实现。
步骤305中,协调设备根据该数据块组的重要性等级对该数据块组的心跳时间进行修正。
在一种可能实现的方式中,修正心跳时间的方法可以如下:首先根据数据块组的重要性等级确定数据块组的修正系数,然后将修正系数与心跳时间的乘积确定为修正后的心跳时间。
例如,重要性从高到低的5个数据块组的修正系数分别为0.7、0.8、1.0、1.2和1.5,心跳时间都为2ms,则修正后的该5个数据块的心跳时间分别为1.4ms、1.6ms、2ms、2.4ms和3.0ms。
该步骤可以由图1B所示的网络设备10中的处理器12执行修正模块164q来实现。
步骤306中,协调设备将修正后的心跳时间返回给该主设备,由该主设备根据该修正后的心跳时间向该从设备发送数据同步指令。
该步骤可以由图1B所示的网络设备10中的处理器12执行时间返回模块164r来实现。
在本发明实施例中,一个数据块组的主设备在生成心跳时间后,将该心跳时间发送给协调设备进行修正,在接收到协调设备返回的修正后的心跳时间之前,该主设备根据生成的心跳时间进行数据同步指令或者心跳信号的发送,在接收到协调设备返回的修正后的心跳时间之后,该主设备改为根据修正后的心跳时间进行数据同步指令或者心跳信号的发送。进一步的,主设备还将修正后的心跳时间发送给该数据块组的从设备,使该数据块组的从设备根据该修正后的心跳时间重新设置选举超时时间。
综上所述,本发明实施例提供的一种基于心跳的数据同步方法,通过协调设备统计数据块组的存取状态信息,根据数据块组的读存取状态信息确定数据块组的重要性等级,接收一个数据块组的主设备发送的、该数据块组的心跳时间,根据该数据块组的重要性等级对该数据块组的心跳时间进行修正,将修正 后的心跳时间返回给主设备,根据各个数据块组各自运行参数对各个数据块组的心跳时间进行修正,从整体上对存储系统各个数据块组的心跳时间进行优化,进一步提高存储系统的读写性能。
下述为本发明装置实施例,可以用于执行本发明方法实施例。对于本发明装置实施例中未披露的细节,请参照本发明方法实施例。
图4是本发明实施例提供的一种基于心跳的数据同步装置的结构方框图,该基于心跳的数据同步装置可以通过软件、硬件或者两者的结合实现成为分布式存储系统的存储设备中的部分或者全部。该分布式存储系统可以是1A所示的分布式存储系统,该存储设备是一个数据块组的主设备,该数据块组包含多个数据块,该多个数据块分别存储于该分布式存储系统的多个存储设备中,该多个存储设备包含该主设备,且该多个存储设备中除该主设备之外的其余设备为该数据块组的从设备,该基于心跳的数据同步装置可以包括:信息获取单元401、心跳时间确定单元402、第一指令发送单元403、第一时间发送单元404和第二时间发送单元405、第一时间接收单元406和第二指令发送单元407。
信息获取单元401,具有与信息获取模块164a相同或相似的功能。
心跳时间确定单元402,具有与心跳时间确定模块164b相同或相似的功能。
第一指令发送单元403,具有与第一指令发送模块164c相同或相似的功能。
第一时间发送单元404,具有与第一时间发送模块164d相同或相似的功能。
第二时间发送单元405,具有与第二时间发送模块164e相同或相似的功能。
第一时间接收单元406,具有与第一时间接收模块164f相同或相似的功能。
第二指令发送单元407,具有与第二指令发送模块164g相同或相似的功能。
图5是本发明实施例提供的一种基于心跳的数据同步装置的结构方框图,该基于心跳的数据同步装置可以通过软件、硬件或者两者的结合实现成为分布 式存储系统的存储设备中的部分或者全部。该分布式存储系统可以是1A所示的分布式存储系统,该存储设备是一个数据块组的从设备,该数据块组包含多个数据块,该多个数据块分别存储于该分布式存储系统的多个存储设备中,该多个存储设备包含该主设备,且该多个存储设备中除该主设备之外的其余设备为该数据块组的从设备,该基于心跳的数据同步装置可以包括:指令接收单元501、数据同步单元502、第二时间接收单元503、设置单元504和选举发起单元505。
指令接收单元501,具有与指令接收模块164h相同或相似的功能。
数据同步单元502,具有与数据同步模块164i相同或相似的功能。
第二时间接收单元503,具有与第二时间接收模块164j相同或相似的功能。
设置单元504,具有与设置模块164k相同或相似的功能。
选举发起单元505,具有与选举发起模块1641相同或相似的功能。
图6是本发明实施例提供的一种基于心跳的数据同步装置的结构方框图,该基于心跳的数据同步装置可以通过软件、硬件或者两者的结合实现成为分布式存储系统的协调设备中的部分或者全部。该分布式存储系统可以是1A所示的分布式存储系统,该分布式存储系统存储有至少一个数据块组,一个数据块组包含多个数据块,该多个数据块分别存储于该分布式存储系统的多个存储设备中,该多个存储设备包含该主设备,且该多个存储设备中除该主设备之外的其余设备为该数据块组的从设备,该基于心跳的数据同步装置可以包括:统计单元601、第三时间接收单元602、等级确定单元603、修正单元604和时间返回单元605。
统计单元601,具有与统计模块164m相同或相似的功能。
第三时间接收单元602,具有与第三时间接收模块164n相同或相似的功能。
等级确定单元603,具有与等级确定模块164p相同或相似的功能。
修正单元604,具有与修正模块164q相同或相似的功能。
时间返回单元605,具有与时间返回模块164r相同或相似的功能。
应当理解的是,在本文中使用的,除非上下文清楚地支持例外情况,单数形式“一个”(“a”、“an”、“the”)旨在也包括复数形式。还应当理解的是,在本 文中使用的“和/或”是指包括一个或者一个以上相关联地列出的项目的任意和所有可能组合。
上述本发明实施例序号仅仅为了描述,不代表实施例的优劣。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本发明的较佳实施例,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (33)

  1. 一种分布式存储系统,其特征在于,所述分布式存储系统存储有至少一个数据块组,且所述分布式存储系统包括多个存储设备;所述多个存储设备中的一个设备为存储所述数据块组的主设备,其余设备为存储所述数据块组的从设备;
    所述主设备,用于获取所述数据块组的存取状态信息,根据所述数据块组的存取状态信息确定所述数据块组的心跳时间,根据所述数据块组的心跳时间向所述从设备发送数据同步指令;
    所述从设备,用于根据所述数据同步指令进行数据同步。
  2. 根据权利要求1所述的系统,其特征在于,所述数据块组的存取状态信息包括所述数据块组的读频率和写频率。
  3. 一种基于心跳的数据同步装置,其特征在于,应用于分布式存储系统,所述分布式存储系统存储有至少一个数据块组,且所述分布式存储系统包括多个存储设备;所述多个存储设备中的一个设备为存储所述数据块组的主设备,其余设备为存储所述数据块组的从设备,所述主设备包括所述装置,所述装置包括:
    信息获取单元,用于获取所述数据块组的存取状态信息;
    心跳时间确定单元,用于根据所述数据块组的存取状态信息确定所述数据块组的心跳时间;
    第一指令发送单元,用于根据所述数据块组的心跳时间向所述从设备发送数据同步指令,所述数据同步指令用于指示所述从设备进行数据同步。
  4. 根据权利要求3所述的装置,其特征在于,所述数据块组的存取状态信息包括所述数据块组的读频率和写频率。
  5. 根据权利要求3或4所述的装置,其特征在于,所述心跳时间确定单元,具体用于根据预设的第一评分规则对所述存取状态信息进行评分,获得第一参考分值,根据所述第一参考分值确定所述数据块组的心跳时间。
  6. 根据权利要求4所述的装置,其特征在于,所述心跳时间确定单元,具 体用于根据所述第一评分规则对所述读频率和所述写频率分别进行评分,并将对所述读频率和所述写频率的评分的和作为第一参考分值,根据所述第一参考分值确定所述数据块组的心跳时间。
  7. 根据权利要求4所述的装置,其特征在于,所述心跳时间确定单元,具体用于获取预先设置的参考时间以及所述存取状态信息对应的第一权重,根据所述存取状态信息、所述参考时间以及所述第一权重计算所述数据块组的心跳时间。
  8. 根据权利要求7所述的装置,其特征在于,所述心跳时间确定单元,具体用于通过以下公式计算所述心跳时间:
    heartbeatTime=Time/(weightR*R+weightW*W);weightR+weightW=1;
    其中,heartbeatTime为所述心跳时间,Time为所述参考时间,R为所述读频率的数值,W为所述写频率的数值,weightR为所述读频率对应的权重,weightW为所述写频率对应的权重。
  9. 根据权利要求3-8任一项所述的装置,其特征在于,所述装置还包括:
    第一时间发送单元,用于将所述数据块组的心跳时间发送给所述从设备,使得所述从设备根据所述心跳时间设置选举超时时间。
  10. 根据权利要求3-9任一项所述的装置,其特征在于,所述分布式存储系统中还包含与所述多个存储设备相连接的协调设备,所述装置还包括:
    第二时间发送单元,用于将所述数据块组的心跳时间发送给所述协调设备;
    第一时间接收单元,用于接收所述协调设备返回的、修正后的心跳时间;
    第二指令发送单元,用于根据所述修正后的心跳时间向所述从设备发送数据同步指令。
  11. 一种基于心跳的数据同步装置,其特征在于,应用于分布式存储系统,所述分布式存储系统存储有至少一个数据块组,且所述分布式存储系统包括多个存储设备;所述多个存储设备中的一个设备为存储所述数据块组的主设备,其余设备为存储所述数据块组的从设备,所述从设备包括所述装置,所述装置包括:
    指令接收单元,用于接收所述主设备根据所述数据块组的心跳时间发送的数据同步指令,其中,所述主设备获取所述数据块组的存取状态信息,并根据所述数据块组的存取状态信息确定所述心跳时间;
    数据同步单元,用于根据所述数据同步指令进行数据同步。
  12. 根据权利要求11所述的装置,其特征在于,所述数据块组的存取状态信息包括所述数据块组的读频率和写频率。
  13. 根据权利要求12所述的装置,其特征在于,所述装置还包括:
    第二时间接收单元,用于接收所述主设备发送的所述心跳时间;
    设置单元,用于根据所述心跳时间设置选举超时时间;
    选举发起单元,用于当所述从设备在所述选举超时时间内未接收到所述主设备发送的任何信号时,向所述多个存储设备中的其它各个存储设备发起主设备选举。
  14. 根据权利要求13所述的装置,其特征在于,所述设置单元,具体用于根据预设的第二评分规则对所述存取状态信息进行评分,获得第二参考分值,根据所述第二参考分值确定所述数据块组的第一超时系数,将所述第一超时系数与所述心跳时间的乘积设置为所述选举超时时间。
  15. 根据权利要求13所述的装置,其特征在于,所述设置单元,具体用于获取预先设置的参考系数以及所述存取状态信息对应的第二权重,根据所述存取状态信息、所述参考系数以及所述第二权重计算所述数据块组的第二超时系数,将所述第二超时系数与所述心跳时间的乘积设置为所述选举超时时间。
  16. 根据权利要求15所述的装置,其特征在于,所述设置单元,具体用于通过以下公式设置所述选举超时时间:
    OverTime=(weightR*R+weightW*W+Reference)*heartbeatTime;
    weightR+weightW=1;
    其中,OverTime为所述选举超时时间,heartbeatTime为所述心跳时间,Reference为所述参考系数,R为所述读频率的数值,W为所述写频率的数值, weightR为所述读频率对应的权重,weightW为所述写频率对应的权重。
  17. 一种基于心跳的数据同步装置,其特征在于,应用于分布式存储系统,所述分布式存储系统存储有至少一个数据块组,且所述分布式存储系统包括多个存储设备以及与所述多个存储设备相连接的协调设备,所述多个存储设备中的一个设备为存储所述数据块组的主设备,其余设备为存储所述数据块组的从设备,所述协调设备包括所述装置,所述装置包括:
    统计单元,用于统计所述数据块组的存取状态信息;
    第三时间接收单元,用于接收所述主设备发送的、所述数据块组的心跳时间,其中,所述主设备获取所述数据块组的存取状态信息,并根据所述数据块组的存取状态信息确定所述心跳时间;
    等级确定单元,用于根据所述数据块组的存取状态信息确定所述数据块组的重要性等级;
    修正单元,用于根据所述数据块组的重要性等级对所述心跳时间进行修正;
    时间返回单元,用于将修正后的心跳时间返回给所述主设备。
  18. 根据权利要求17所述的装置,其特征在于,所述数据块组的存取状态信息包括所述数据块组的读频率和写频率。
  19. 一种基于心跳的数据同步方法,其特征在于,用于分布式存储系统中,所述分布式存储系统存储有至少一个数据块组,且所述分布式存储系统包括多个存储设备;所述多个存储设备中的一个设备为存储所述数据块组的主设备,其余设备为存储所述数据块组的从设备,所述方法包括:
    所述主设备获取所述数据块组的存取状态信息;
    所述主设备根据所述数据块组的存取状态信息确定所述数据块组的心跳时间;
    所述主设备根据所述数据块组的心跳时间向所述从设备发送数据同步指令,所述数据同步指令用于指示所述从设备进行数据同步。
  20. 根据权利要求19所述的方法,其特征在于,所述数据块组的存取状态信息包括所述数据块组的读频率和写频率。
  21. 根据权利要求19或20所述的方法,其特征在于,所述主设备根据所述数据块组的存取状态信息确定所述数据块组的心跳时间,包括:
    所述主设备根据预设的第一评分规则对存取状态信息进行评分,获得第一参考分值;
    所述主设备根据所述第一参考分值确定所述数据块组的心跳时间。
  22. 根据权利要求20所述的方法,其特征在于,所述主设备根据所述数据块组的存取状态信息确定所述数据块组的心跳时间,包括:
    所述主设备根据所述第一评分规则对所述读频率和所述写频率分别进行评分;
    所述主设备将对所述读频率和所述写频率的评分的和作为第一参考分值;
    所述主设备根据所述第一参考分值确定所述数据块组的心跳时间。
  23. 根据权利要求20所述的方法,其特征在于,所述主设备根据所述数据块组的相关信息确定所述数据块组的心跳时间,包括:
    所述主设备获取预先设置的参考时间以及所述存取状态信息对应的第一权重;
    所述主设备根据所述存取状态信息、所述参考时间以及所述第一权重计算所述数据块组的心跳时间。
  24. 根据权利要求23所述的方法,其特征在于,所述主设备根据所述数据块组的存取状态信息确定所述数据块组的心跳时间,包括:
    通过以下公式计算所述心跳时间:
    heartbeatTime=Time/(weightR*R+weightW*W);weightR+weightW=1;
    其中,heartbeatTime为所述心跳时间,Time为所述参考时间,R为所述读频率的数值,W为所述写频率的数值,weightR为所述读频率对应的权重,weightW为所述写频率对应的权重。
  25. 根据权利要求19-24任一项所述的方法,其特征在于,所述方法还包括:
    所述主设备将所述数据块组的心跳时间发送给所述从设备,使得所述从设 备根据所述心跳时间设置选举超时时间。
  26. 根据权利要求19-25任一项所述的方法,其特征在于,所述分布式存储系统还包含与所述多个存储设备相连接的协调设备,所述方法还包括:
    所述主设备将所述数据块组的心跳时间发送给所述协调设备;
    所述主设备接收所述协调设备返回的、修正后的心跳时间;
    所述主设备根据所述修正后的心跳时间向所述从设备发送数据同步指令。
  27. 一种基于心跳的数据同步方法,其特征在于,用于分布式存储系统中,所述分布式存储系统存储有至少一个数据块组,且所述分布式存储系统包括多个存储设备;所述多个存储设备中的一个设备为存储所述数据块组的主设备,其余设备为存储所述数据块组的从设备,所述方法包括:
    所述从设备接收所述主设备根据所述数据块组的心跳时间发送的数据同步指令,其中,所述主设备获取所述数据块组的存取状态信息,并根据所述数据块组的存取状态信息确定所述心跳时间;
    所述从设备根据所述数据同步指令进行数据同步。
  28. 根据权利要求27所述的方法,其特征在于,所述数据块组的存取状态信息包括所述数据块组的读频率和写频率。
  29. 根据权利要求28所述的方法,其特征在于,所述方法还包括:
    所述从设备接收所述主设备发送的所述心跳时间;
    所述从设备根据所述心跳时间设置选举超时时间;
    当所述从设备在所述选举超时时间内未接收到所述主设备发送的任何信号时,所述从设备向所述多个存储设备中的其它各个存储设备发起主设备选举。
  30. 根据权利要求29所述的方法,其特征在于,所述从设备根据所述心跳时间设置选举超时时间,包括:
    所述从设备根据预设的第二评分规则对所述存取状态信息进行评分,获得第二参考分值;
    所述从设备根据所述第二参考分值确定所述数据块组的第一超时系数;
    所述从设备将所述第一超时系数与所述心跳时间的乘积设置为所述选举超时时间。
  31. 根据权利要求29所述的方法,其特征在于,所述从设备根据所述心跳时间设置选举超时时间,包括:
    所述从设备获取预先设置的参考系数以及所述存取状态信息对应的第二权重;
    所述从设备根据所述存取状态信息、所述参考系数以及所述第二权重计算所述数据块组的第二超时系数;
    所述从设备将所述第二超时系数与所述心跳时间的乘积设置为所述选举超时时间。
  32. 根据权利要求31所述的方法,其特征在于,所述从设备根据所述心跳时间设置选举超时时间,包括通过以下公式设置所述选举超时时间:
    OverTime=(weightR*R+weightW*W+Reference)*heartbeatTime;
    weightR+weightW=1;
    其中,OverTime为所述选举超时时间,heartbeatTime为所述心跳时间,Reference为所述参考系数,R为所述读频率的数值,W为所述写频率的数值,weightR为所述读频率对应的权重,weightW为所述写频率对应的权重。
  33. 一种基于心跳的数据同步方法,其特征在于,用于分布式存储系统中,所述分布式存储系统存储有至少一个数据块组,且所述分布式存储系统包括多个存储设备以及与所述多个存储设备相连接的协调设备;所述多个存储设备中的一个设备为存储所述数据块组的主设备,其余设备为存储所述数据块组的从设备,所述协调设备执行所述方法,所述方法包括:
    所述协调设备统计所述数据块组的存取状态信息,并接收所述主设备发送的、所述数据块组的心跳时间,其中,所述主设备获取所述数据块组的存取状态信息,并根据所述数据块组的存取状态信息确定所述心跳时间;
    所述协调设备根据所述数据块组的存取状态信息确定所述数据块组的重要性等级;
    所述协调设备根据所述数据块组的重要性等级对所述心跳时间进行修正;
    所述协调设备将修正后的心跳时间返回给所述主设备。
PCT/CN2016/097244 2016-02-05 2016-08-29 基于心跳的数据同步装置、方法及分布式存储系统 Ceased WO2017133233A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP16854615.8A EP3220610B1 (en) 2016-02-05 2016-08-29 Heartbeat-based data synchronization device, method, and distributed storage system
US15/583,687 US10025529B2 (en) 2016-02-05 2017-05-01 Heartbeat-based data synchronization apparatus and method, and distributed storage system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610082068.8A CN107046552B (zh) 2016-02-05 2016-02-05 基于心跳的数据同步装置、方法及分布式存储系统
CN201610082068.8 2016-02-05

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/583,687 Continuation US10025529B2 (en) 2016-02-05 2017-05-01 Heartbeat-based data synchronization apparatus and method, and distributed storage system

Publications (1)

Publication Number Publication Date
WO2017133233A1 true WO2017133233A1 (zh) 2017-08-10

Family

ID=59337398

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/097244 Ceased WO2017133233A1 (zh) 2016-02-05 2016-08-29 基于心跳的数据同步装置、方法及分布式存储系统

Country Status (4)

Country Link
US (1) US10025529B2 (zh)
EP (1) EP3220610B1 (zh)
CN (1) CN107046552B (zh)
WO (1) WO2017133233A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10785350B2 (en) * 2018-10-07 2020-09-22 Hewlett Packard Enterprise Development Lp Heartbeat in failover cluster
CN112653734A (zh) * 2020-12-11 2021-04-13 邦彦技术股份有限公司 服务器集群实时主从控制和数据同步系统及方法
CN113297236A (zh) * 2020-11-10 2021-08-24 阿里巴巴集团控股有限公司 分布式一致性系统中主节点的选举方法、装置及系统
CN113411237A (zh) * 2021-08-18 2021-09-17 成都丰硕智能数字科技有限公司 一种低延迟检测终端状态的方法、存储介质及系统

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107181637B (zh) * 2016-03-11 2021-01-29 华为技术有限公司 一种心跳信息发送方法、装置及心跳发送节点
CN105916100B (zh) * 2016-04-01 2020-04-28 华为技术有限公司 代理心跳包的方法、装置和通信系统
CN106897338A (zh) * 2016-07-04 2017-06-27 阿里巴巴集团控股有限公司 一种针对数据库的数据修改请求处理方法及装置
CN109525347B (zh) * 2017-09-19 2019-11-22 比亚迪股份有限公司 时间同步方法和装置
US10637865B2 (en) 2017-10-16 2020-04-28 Juniper Networks, Inc. Fast heartbeat liveness between packet processing engines using media access control security (MACSEC) communication
CN110597909B (zh) * 2019-09-12 2023-03-14 广州南翼信息科技有限公司 一种保持客户端和多终端设备状态一致的方法
JP7515693B2 (ja) * 2020-08-03 2024-07-12 ヒタチ ヴァンタラ エルエルシー 複数のパーティショングループ間のハートビート通信のランダム化
CN113609043B (zh) * 2021-06-20 2024-07-05 山东云海国创云计算装备产业创新中心有限公司 一种i2c主机的数据传输方法、装置、设备及可读介质
CN114344799B (zh) * 2021-12-13 2022-11-11 深圳市培林体育科技有限公司 智能云跳绳和跳绳运动系统及其控制方法
CN115037745B (zh) * 2022-05-18 2023-09-26 阿里巴巴(中国)有限公司 一种在分布式系统中选举的方法及装置
CN115167777B (zh) * 2022-07-27 2025-07-04 苏州浪潮智能科技有限公司 分布式存储系统的优化方法、装置、设备及可读存储介质
CN118228068A (zh) * 2022-12-21 2024-06-21 戴尔产品有限公司 基于相关性的更新机制
CN116881984B (zh) * 2023-09-08 2024-02-23 云筑信息科技(成都)有限公司 一种数据监测方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150074178A1 (en) * 2013-09-11 2015-03-12 Samsung Electronics Co., Ltd. Distributed processing method
CN104765661A (zh) * 2014-12-30 2015-07-08 深圳市安云信息科技有限公司 一种云存储服务中元数据服务节点的多节点热备方法
CN104994168A (zh) * 2015-07-14 2015-10-21 苏州科达科技股份有限公司 分布式存储方法及分布式存储系统

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6651242B1 (en) * 1999-12-14 2003-11-18 Novell, Inc. High performance computing system for distributed applications over a computer
US7185236B1 (en) * 2002-08-30 2007-02-27 Eternal Systems, Inc. Consistent group membership for semi-active and passive replication
JP4415610B2 (ja) * 2003-08-26 2010-02-17 株式会社日立製作所 系切替方法、レプリカ作成方法、及びディスク装置
US7647329B1 (en) * 2005-12-29 2010-01-12 Amazon Technologies, Inc. Keymap service architecture for a distributed storage system
US7827428B2 (en) * 2007-08-31 2010-11-02 International Business Machines Corporation System for providing a cluster-wide system clock in a multi-tiered full-graph interconnect architecture
US9804928B2 (en) * 2011-11-14 2017-10-31 Panzura, Inc. Restoring an archived file in a distributed filesystem
EP2587712A4 (en) * 2012-06-08 2014-07-23 Huawei Device Co Ltd METHOD AND DEVICE FOR SYNCHRONIZATION OF CARDIAC BEATS
US10264071B2 (en) * 2014-03-31 2019-04-16 Amazon Technologies, Inc. Session management in distributed storage systems
CN104572917A (zh) * 2014-12-29 2015-04-29 成都致云科技有限公司 数据锁定方法、装置及分布式存储系统
CN105260136B (zh) * 2015-09-24 2019-04-05 北京百度网讯科技有限公司 数据读写方法及分布式存储系统

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150074178A1 (en) * 2013-09-11 2015-03-12 Samsung Electronics Co., Ltd. Distributed processing method
CN104765661A (zh) * 2014-12-30 2015-07-08 深圳市安云信息科技有限公司 一种云存储服务中元数据服务节点的多节点热备方法
CN104994168A (zh) * 2015-07-14 2015-10-21 苏州科达科技股份有限公司 分布式存储方法及分布式存储系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3220610A4 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10785350B2 (en) * 2018-10-07 2020-09-22 Hewlett Packard Enterprise Development Lp Heartbeat in failover cluster
CN113297236A (zh) * 2020-11-10 2021-08-24 阿里巴巴集团控股有限公司 分布式一致性系统中主节点的选举方法、装置及系统
CN112653734A (zh) * 2020-12-11 2021-04-13 邦彦技术股份有限公司 服务器集群实时主从控制和数据同步系统及方法
CN112653734B (zh) * 2020-12-11 2023-09-19 邦彦技术股份有限公司 服务器集群实时主从控制和数据同步系统及方法
CN113411237A (zh) * 2021-08-18 2021-09-17 成都丰硕智能数字科技有限公司 一种低延迟检测终端状态的方法、存储介质及系统

Also Published As

Publication number Publication date
CN107046552B (zh) 2020-10-23
EP3220610A4 (en) 2017-10-18
CN107046552A (zh) 2017-08-15
EP3220610A1 (en) 2017-09-20
US10025529B2 (en) 2018-07-17
US20170235492A1 (en) 2017-08-17
EP3220610B1 (en) 2019-05-22

Similar Documents

Publication Publication Date Title
WO2017133233A1 (zh) 基于心跳的数据同步装置、方法及分布式存储系统
US10207183B2 (en) Wireless gaming protocol
US11878251B2 (en) Method of synchronizing online game, and server device
CN109087082B (zh) 基于区块链的金融交易执行方法及装置、电子设备
US10021182B2 (en) Method and apparatus for data synchronization
US11231497B2 (en) Positioning method and positioning apparatus
EP2991280A1 (en) Content sharing method and social synchronizing apparatus
WO2016044329A1 (en) Real-time, low memory estimation of unique client computers communicating with a server computer
CN117176666A (zh) 网络流量控制方法、装置、交换机、电子设备和存储介质
CN105430028A (zh) 服务调用方法、提供方法及节点
CN107577550B (zh) 一种确定访问请求的响应是否异常的方法及装置
CN111565060A (zh) 波束赋形方法和天线设备
US10445649B2 (en) Network application participation control system
CN114650545B (zh) 一种波束参数的确定方法、装置及网络设备
US20170117878A1 (en) Method and Apparatus for Determining Stability Factor of Adaptive Filter
US12566101B2 (en) Vibration evaluation method and apparatus, computer device, storage medium, and computer program product
CN103309986A (zh) 一种网页访问控制方法及系统
WO2024216907A1 (zh) Pucch信道的sinr估计方法、装置、系统和存储介质
CN113419669B (zh) Io请求处理方法、装置、电子设备及计算机可读介质
CN105323293B (zh) 数据传输服务切换系统和方法
CN115730655B (zh) 一种基于分组加权的多机构联合的图像识别模型训练方法
CN106790634B (zh) 一种用于确定是否发起备份请求的方法与设备
CN112527560A (zh) 基于物联网终端自适应的数据备份方法及装置
CN202455381U (zh) 一种分布式即时通讯系统
CN118170759A (zh) 一种数据库优化列存聚集方法和系统

Legal Events

Date Code Title Description
REEP Request for entry into the european phase

Ref document number: 2016854615

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2017526901

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2017119652

Country of ref document: RU

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE