WO2017050165A1 - 一种数据同步方法和系统 - Google Patents

一种数据同步方法和系统 Download PDF

Info

Publication number
WO2017050165A1
WO2017050165A1 PCT/CN2016/098960 CN2016098960W WO2017050165A1 WO 2017050165 A1 WO2017050165 A1 WO 2017050165A1 CN 2016098960 W CN2016098960 W CN 2016098960W WO 2017050165 A1 WO2017050165 A1 WO 2017050165A1
Authority
WO
WIPO (PCT)
Prior art keywords
task
data
synchronization
thread
failed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2016/098960
Other languages
English (en)
French (fr)
Inventor
刘益
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to EP16848047.3A priority Critical patent/EP3355189A4/en
Priority to JP2018512556A priority patent/JP6832917B2/ja
Publication of WO2017050165A1 publication Critical patent/WO2017050165A1/zh
Anticipated expiration legal-status Critical
Priority to US15/936,313 priority patent/US20180218058A1/en
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2056Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring
    • G06F11/2082Data synchronisation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/1658Data re-synchronization of a redundant component, or initial sync of replacement, additional or spare unit
    • G06F11/1662Data re-synchronization of a redundant component, or initial sync of replacement, additional or spare unit the resynchronized component or unit being a persistent storage device
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/485Task life-cycle, e.g. stopping, restarting, resuming execution
    • G06F9/4856Task life-cycle, e.g. stopping, restarting, resuming execution resumption being on a different machine, e.g. task migration, virtual machine migration
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • G06F21/108Transfer of content, software, digital rights or licenses
    • G06F21/1085Content sharing, e.g. peer-to-peer [P2P]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • G06F21/108Transfer of content, software, digital rights or licenses
    • G06F21/1087Synchronisation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/80Database-specific techniques

Definitions

  • the present application relates to the field of data processing technologies, and in particular, to a data synchronization method and a data synchronization system.
  • Performing data read and write between a variety of different types of databases/file systems that is, when data is imported and exported, sometimes it is necessary to perform offline synchronization.
  • the offline synchronization period is long, and the synchronization process is heavily dependent on the source, the gateway, and the stability of the destination.
  • a task can be divided into multiple task fragments for processing. However, if the synchronization of one fragment fails, the entire task will fail, and the synchronization results of the remaining fragments cannot be retained.
  • the technical problem to be solved by the embodiments of the present application is to provide a data synchronization method to solve the synchronization failure problem that occurs in the above data synchronization.
  • the embodiment of the present application further provides a data synchronization system to ensure implementation and application of the foregoing method.
  • the present application discloses a data synchronization method, including: assigning one task to each data slice of a data set to be processed; and starting a task thread of the task, performing between the source end and the destination end Offline data synchronization corresponding to data fragmentation; after determining that any data fragment corresponds to task synchronization failure, if it is determined that the failed task supports the failover operation, the failed task is The processing resource corresponding to the data fragment is cleaned; a task is assigned to the data fragment corresponding to the failed task, and the task thread that starts the reassigned task performs offline data synchronization of the data fragment at the source end and the destination end.
  • the determining the failed task supports the failover operation, and: when determining that the read/write feature of the destination end meets the failover condition, determining that the failed task supports the failover operation.
  • the method further includes: when the read/write feature of the destination end is a temporary synchronization feature or an idempotent feature, determining that the read/write feature of the destination end meets a failover condition; wherein the temporary synchronization feature includes: synchronizing during synchronization The data is written into the temporary area, and after the synchronization is completed, the synchronization data of the temporary area is transferred to the fixed storage area by the operation instruction, and the synchronization data is valid; the idempotent feature includes the data writing operation supporting the idempotent operation.
  • the processing resource of the data fragment corresponding to the failed task is cleared, including: releasing a resource to the task thread corresponding to the failed task, and deleting statistics of the data fragment corresponding to the failed task.
  • the task thread includes a read thread and a write thread; and performing resource release on the task thread corresponding to the failed task, including: clearing synchronization data stored in the data buffer corresponding to the read thread and the write thread; The failed data fragment is occupied by the read thread and the write thread.
  • the method further includes: stopping, by the task thread, performing offline data synchronization between the source end and the destination end.
  • the method further includes: when there is any abnormality information, feedback processing failure information, where the abnormality information includes: source end abnormality information, destination end abnormality information, network abnormality information, and task thread abnormality information;
  • the processing failure information determines that the abnormal situation corresponds to the failure of the task synchronization.
  • the embodiment of the present application further discloses a data synchronization system, including: a task allocation module, configured to respectively allocate one task for each data segment of the data set to be processed; and re-allocate a task for the data segment corresponding to the failed task; a data synchronization module, configured to start a task thread of the task, perform offline data synchronization corresponding to data fragmentation between the source end and the destination end; and execute a task thread that starts the reassigned task to execute the data fragment at the source end Offline data synchronization with the destination; the failover module is used to determine the failure task after determining that any data fragment corresponds to the synchronization failure of the task. The failover operation is supported, and the processing resources corresponding to the data fragment of the failed task are cleaned; the task allocation module is triggered.
  • a task allocation module configured to respectively allocate one task for each data segment of the data set to be processed; and re-allocate a task for the data segment corresponding to the failed task
  • a data synchronization module configured
  • the failover module includes: a support transition determination submodule, configured to determine that the failed task supports the failover operation when determining that the read/write feature of the destination end meets the failover condition.
  • the support transfer determination sub-module is further configured to: when the read/write feature of the destination end is a temporary synchronization feature or an idempotent feature, determine that the read/write feature of the destination end meets a failover condition; wherein the temporary synchronization
  • the feature includes: the synchronous data is written into the temporary area during the synchronization process, and after the synchronization is completed, the synchronization data of the temporary area is transferred to the fixed storage area by the operation instruction, and the synchronization data is valid; the idempotent feature includes data writing operation support. Idempotent operation.
  • the failover module includes: a resource cleaning sub-module, configured to perform resource release on the task thread corresponding to the failed task, and delete statistics of the data fragment corresponding to the failed task.
  • the resource cleaning sub-module is configured to clear synchronization data stored in the data buffer corresponding to the read thread and the write thread; and cancel the occupation of the read thread and the write thread by the data fragment corresponding to the failed task.
  • the resource cleaning sub-module is further configured to stop the task data from performing offline data synchronization between the source end and the destination end.
  • the method further includes: a failure determining module, configured to: when the abnormal information exists, feedback processing failure information, where the abnormal information includes: source abnormal information, destination abnormal information, network abnormal information, and task thread Abnormal information; determining, according to the processing failure information, that the abnormal situation corresponds to task synchronization failure.
  • a failure determining module configured to: when the abnormal information exists, feedback processing failure information, where the abnormal information includes: source abnormal information, destination abnormal information, network abnormal information, and task thread Abnormal information; determining, according to the processing failure information, that the abnormal situation corresponds to task synchronization failure.
  • the embodiments of the present application include the following advantages:
  • each task of the data set to be processed is respectively assigned a task, and the task thread of the task is started, and offline data synchronization of the corresponding data fragment is performed between the source end and the destination end. If it is determined that the data synchronization fails in the data synchronization process, it is determined that the failed task supports the failover operation failover, and the task level failover can be executed, and the processing resources corresponding to the data fragment of the failed task are cleaned and re-established. Corresponding to the failed task The data fragment allocates a task, and the task thread that starts the reassigned task performs offline data synchronization of the data fragment at the source end and the destination end. Therefore, the data fragment of the failed task is directly resynchronized, and the entire data set to be processed does not need to be reprocessed, which saves resources and improves synchronization time.
  • FIG. 1 is a flow chart showing the steps of an embodiment of a data synchronization method of the present application
  • FIG. 2 is a flow chart showing the steps of another embodiment of the data synchronization method of the present application.
  • FIG. 3 is a structural block diagram of an embodiment of a data synchronization system of the present application.
  • FIG. 4 is a structural block diagram of another embodiment of a data synchronization system of the present application.
  • One of the core concepts of the embodiments of the present application is to provide a data synchronization method and system to solve the synchronization failure problem that occurs in the above data synchronization.
  • Each task of the data set to be processed is assigned a task, and the task thread of the task is started, and offline data synchronization of the corresponding data fragment is performed between the source end and the destination end. If it is determined that the data synchronization fails in the data synchronization process, it is determined that the failed task supports the failover operation failover, and the task level failover can be executed, and the processing resources corresponding to the data fragment of the failed task are cleaned and re-established.
  • a task is assigned to the failed task corresponding data fragment, and the task thread that starts the reassigned task performs offline data synchronization of the data fragment at the source end and the destination end. Therefore, when the data fragmentation fails, the task level failover is performed, that is, the failed data fragments are re-synchronized, and the entire pending data set is not reprocessed, which saves resources and improves synchronization time.
  • FIG. 1 a flow chart of steps of a data synchronization method embodiment of the present application is shown, which may specifically include the following steps:
  • Step 102 assign a task to each data segment of the data set to be processed.
  • Step 104 Start a task thread of the task, and perform offline data synchronization corresponding to data fragmentation between the source end and the destination end.
  • the data set to be synchronized is used as the data set to be processed, and the source and destination of the offline data synchronization can be set, and the database/file system in which the data set to be processed is located is used as the data/file system in which the data set to be processed is located.
  • the database/file system to which the data set to be processed is to be synchronized is used as the destination.
  • the data set to be processed can be regarded as a set of service data, which includes a large amount of service data. Therefore, before performing offline synchronization on the data set to be processed, the data set to be processed may be first divided into several data fragments.
  • the main thread that performs data synchronization establishes multiple tasks, and assigns one data fragment to each task, that is, each data fragment corresponds to one task, and each task corresponds to a corresponding task thread, thereby targeting the data set to be processed.
  • Each data fragment starts the corresponding task thread separately, and synchronously uses multiple task threads to perform offline data synchronization between the source end and the destination end, that is, data read and write operations are performed between the source end and the destination end.
  • Step 106 After determining that the data synchronization failure of any data fragment corresponds to the task, if it is determined that the failed task supports the failover operation, the processing resource corresponding to the data fragment of the failed task is cleaned.
  • Step 108 Re-assign a task to the failed task corresponding data fragment, and start a task thread of the reassigned task to perform offline data synchronization of the data fragment at the source end and the destination end.
  • the synchronization failure of the data fragment may be caused by the task, such as the network instability, and the time at which the destination end writes data expires. Therefore, after determining that any data fragment corresponds to a task synchronization failure, it is determined whether the failed task supports a failover operation for the failed task.
  • failover refers to the failover of databases, application services, hardware devices, etc. in the computer field. It is a backup operation mode. When a task fails for some reason, it can be transferred to other components (such as nodes, processes, threads). Perform heavy processing on it. In this embodiment, it may be determined according to the attribute of the destination end whether the failed data fragment supports the failover operation.
  • the data fragment of the failed task may be transferred to another task thread for reprocessing, and before the data fragment of the failed task is transferred, a The data fragmentation is processed by two task threads at the same time, and the corresponding processing resources of the data fragment corresponding to the failed task may be cleaned first, such as releasing the data fragment. Occupation of task threads, etc.
  • the task thread may be re-allocated for the failed data fragment, and then the re-allocated task thread performs offline data synchronization of the data fragment at the source end and the destination end. That is, the read and write operations of offline data are performed between the source end and the destination end.
  • each task of the data set to be processed is assigned a task, and the task thread of the task is started, and offline data synchronization of the corresponding data fragment is performed between the source end and the destination end. If it is determined that the data synchronization fails in the data synchronization process, it is determined that the failed task supports the failover operation failover, and the task level failover can be executed, and the processing resources corresponding to the data fragment of the failed task are cleaned and re-established.
  • a task is assigned to the failed task corresponding data fragment, and the task thread that starts the reassigned task performs offline data synchronization of the data fragment at the source end and the destination end. Therefore, the data fragment of the failed task is directly resynchronized, and the entire data set to be processed does not need to be reprocessed, which saves resources and improves synchronization time.
  • this embodiment discusses the offline data synchronization operation based on failover in detail.
  • the offline data synchronization in the embodiment of the present application can be applied to the offline synchronization of datax, which is a tool for exchanging data between heterogeneous databases/file systems at high speed, and realizes in any data processing system (such as RDBMS/Hdfs/). Data exchange between Local filesystems.
  • DataX has the following features: high-speed exchange of data between heterogeneous databases/file systems; built using the Framework+plugin architecture, the Framework handles most of the technical issues of high-speed data exchange such as buffering, flow control, concurrency, and context loading.
  • the simple interface interacts with the plug-in.
  • the plug-in only needs to access the data processing system; the running mode is stand-alone; the data transfer process is completed in a single process, full memory operation, no disk access, no IPC; open type Framework, developers can develop a new plug-in in a very short time to quickly support the new database / file system. Therefore, offline data synchronization is discussed in detail by taking offline synchronization synchronization as datax.
  • the method may include the following steps:
  • Step 202 Acquire a data set to be processed, and perform segmentation on the data set to be processed to obtain data fragmentation.
  • step 204 a task is allocated for the data fragment.
  • step 206 the main thread starts each task group (taskGroup), and the taskGroup starts its own task.
  • step 208 the task thread of the task performs offline data synchronization between the source end and the destination end.
  • the data set to be processed is first determined.
  • the data set to be processed is divided into several data fragments.
  • the main process that performs offline data synchronization establishes multiple task groups, and multiple tasks are established under each task group. Therefore, offline data synchronization can be performed by using the task group taskGroup, that is, each data is fragmented. Assign a task and use the task thread of the task to synchronize. After the data fragment is allocated, the main thread starts each taskGroup, and each taskGroup starts its own task, and the task thread of the task performs offline data synchronization between the source end and the destination end.
  • the task thread includes a read thread and a write thread
  • the read thread is used to read data
  • the write thread is used to write data
  • the main thread also allocates a data buffer for each task for temporarily storing read and write data. Therefore, when offline data synchronization is performed, reading and writing of data are respectively performed between the source end and the destination end by the read thread and the write thread, and data can be temporarily stored in the data buffer, thereby implementing offline data synchronization.
  • step 210 the task feeds back status information to the taskGroup.
  • Step 212 Determine whether the task fails to be synchronized according to the status information.
  • the task collects its own state information and feeds back to the taskGroup.
  • the state information includes the processing result of the offline data synchronization of the data fragmentation, that is, the task will inform the taskGroup whether the offline synchronization is successfully processed. If the processing succeeds, the success message can be fed back. If the processing fails, the processing fails the message. Therefore, the processing failure can be determined according to the processing failure information in the status information.
  • the processing failure information when there is any abnormality information, the processing failure information is fed back, wherein the abnormality information includes: source end abnormality information, destination end abnormality information, network abnormality information, and task thread abnormality information;
  • the failure information determines that the abnormal situation corresponds to the failure of the task synchronization.
  • the source-side exception information that is, the exception information generated by the source-side exception, such as data source jitter is not available.
  • the destination end exception information that is, the exception information generated by the destination end exception, such as the destination end write slow, causes the source connection timeout to be closed.
  • Network exception information that is, abnormal information generated by network anomalies, such as network interruption.
  • Task thread exception information that is, exception information generated by a task thread exception, such as a thread error.
  • the task generates the corresponding processing failure information when the above exception occurs.
  • the task adds the processing failure information to the status information and feeds back to the taskGroup.
  • the taskGroup determines that the abnormal situation corresponds to the task synchronization failure based on the processing failure information.
  • step 214 If yes, the synchronization failure is determined according to the status information, and step 214 is performed; if not, the synchronization is determined according to the status information, and step 220 is performed.
  • Step 214 Determine, according to the read/write feature of the destination end, whether the failed task supports failover.
  • the data fragment corresponding to the failed task can be re-synchronized, that is, the re-processing of the synchronization failure data fragment is supported, thereby eliminating the need to re-synchronize the entire data set to be processed, saving resources and synchronised time.
  • the failover can be performed according to the read/write feature of the destination.
  • the read/write feature of the destination is a temporary synchronization feature or an idempotent feature, it can be determined that the read/write feature of the destination meets the failover condition, that is, the failed task supports failover.
  • the temporary synchronization feature is a feature of writing synchronous data to the temporary area during the synchronization process, and after the synchronization is completed, the synchronization data of the temporary area is transferred to the fixed storage area by the operation instruction, and the synchronization data is valid.
  • the destination end when the data synchronization between the source end and the destination end is performed, when the destination end writes the synchronous data, the synchronous data is first written into the temporary area (ie, the temporary buffer) for caching, and when a data fragment is synchronized, the destination end sends an operation. Execute the commit command, and then move the synchronous data of the temporary area to the actual production area, that is, a fixed storage area according to the commit command. After the transfer is completed, the synchronous data takes effect.
  • the temporary area ie, the temporary buffer
  • the task fails and the destination does not send a commit command, it can perform failover to reinitialize a task to the new temporary zone synchronization data, without regard to the failure of the task corresponding to the temporary zone synchronization data, because the purpose The end will automatically clean up the synchronization data of the temporary zone corresponding to the task, and will not be applied to production and take effect. Therefore, if the data is synchronized to the destination with the temporary synchronization feature, the corresponding failed task can support failover.
  • An idempotent feature is a feature of a data write operation that supports an idempotent operation. That is, the synchronous data writing of the destination end is idempotent, that is, the impact of any multiple executions is the same as the impact of one execution, that is, if multiple writes are performed during the data synchronization process, the data written later will be overwritten. The problem of dropping the previous data without repeating the data. If the destination has an idempotent feature, the corresponding task supports failover.
  • the offline synchronization described above is applied to the datax.
  • the task fails, it is accurately determined whether the task can failover, and the criteria for different plugins are different.
  • the destination is odpswriter or mysqlwirter system
  • its write mode is replace mode, that is, the write operation is idempotent, so support for failover; and if the destination is tairwriter's put mode, task failover can also be implemented.
  • the commit command is not fed back during the synchronous data process. Since the synchronous data written in the temporary area is in the temporary area, that is, the data is not valid, the failover can be performed.
  • the supportFailover method can be implemented in the writer of the task, and the taskGroup is notified according to the write feature of the current destination and the synchronization progress returns true or false, and whether the task supports failover. If yes, it is determined that the failed task supports failover, and step 216 is performed; if not, if the failed task does not support failover, then return to step 204 to re-process the processed data set for synchronization.
  • Step 216 Perform resource release on the task thread corresponding to the failed task, and delete statistics of the data fragment corresponding to the failed task.
  • the taskGroup finds that the task fails, and judges that the failed task supports failover, interrupts the task thread of the failed task, and clears the statistics.
  • the failed task may be released to the task thread, that is, the stop task corresponds to the task thread stopping the external read and write operations, and deleting the statistical data corresponding to the data fragment of the failed task, such as clearing the synchronization of the data slice.
  • the releasing the resource to the task thread corresponding to the failed task includes: clearing the synchronization data stored in the data buffer corresponding to the read thread and the write thread; and canceling the data fragment corresponding to the failed task The occupation of the read thread and the write thread.
  • the task thread uses the read thread to perform the read operation of the synchronous data, and the write thread executes the write operation of the synchronous data.
  • the current read and write operations of the read thread and the write thread can be stopped, and the read operation is cleared.
  • the read thread and the write thread correspond to the synchronization data stored in the data buffer, and then cancel the occupation of the read thread and the write thread corresponding to the data fragment of the failed task, so that the data fragment is no longer processed by the task thread.
  • Step 218 Determine whether the processing resource corresponding to the data fragment of the failed task is cleared.
  • the processing resources of the task need to be released to ensure that when the failover is executed, when the reassigned task performs synchronization, the previous failed task is terminated, ensuring that there are not two tasks at the same time.
  • the reassigned task can perform data statistics again when performing synchronization. Therefore, to ensure that the failed task resources have been completely released, to ensure that the data written by the final destination is not lost.
  • the resource can be cleaned by interrupting the read and write threads of the failed task, and setting the memory channel to be operated by the read and write threads to be invalid.
  • the taskGroup only re-assigns the task to the data fragment after confirming that the task has completely stopped.
  • the reassigned task performs data synchronization.
  • the failure task will report to the TaskGroup whether its own read and write threads have ended, and the memory resources have been released. Therefore, the TaskGroup determines whether the processing resources are cleaned based on the feedback of the failed task.
  • step 204 is performed; if not, the processing resource is not cleared, and step 216 is executed to continue to clean up the resource.
  • the failover can be performed on the failed task. Therefore, returning to step 204, a task is reassigned for the failed task corresponding to the data fragment, and the reassigned task is used to synchronize the data fragment of the failed synchronization until the data is synchronized. Successful, end the task.
  • step 220 the data of the task is successfully synchronized, and the task is ended.
  • the status information it is determined that the synchronization of the task is successful, and the task is terminated successfully.
  • the task level failover is executed without resynchronizing the entire data set to be processed, thereby improving synchronization efficiency.
  • the plug-in cannot be resumed.
  • the source-side data storage in offline synchronization cannot support the location setting. If the data fragment is read synchronously, the error cannot be easily and easily obtained from the error point. The bit begins to pull the data read again. In this embodiment, the failure level of the task level is redrawn from the beginning of the source, and the problem of the point is solved.
  • the existing plug-in itself has a retrying granularity, and is generally retried for a single record or a batch submission capture exception.
  • the task level failover can resynchronize the data fragments to solve the above problem.
  • data fragments can be rescheduled to different machines, and tasks can be newly allocated to automatically restore data synchronization.
  • this embodiment also discloses a data synchronization system.
  • FIG. 3 a structural block diagram of an embodiment of a data synchronization system of the present application is shown, which may specifically include the following modules:
  • the task assignment module 302 is configured to respectively allocate one task for each data segment of the data set to be processed; and re-allocate a task for the data segment corresponding to the failed task.
  • a data synchronization module 304 configured to start a task thread of the task, perform offline data synchronization corresponding to data fragmentation between the source end and the destination end; and execute a task thread that starts the reassigned task to execute the data fragmentation at the source Offline data synchronization between the end and destination.
  • the failover module 306 is configured to: if it is determined that the failed task supports the failover operation, if the failed task supports the failover operation, the processing resource corresponding to the data fragment of the failed task is cleaned; and the task allocation module is triggered.
  • the task assignment module 302 allocates one task for each data segment of the data set to be processed, and then the data synchronization module 304 starts the task thread of the task, and performs offline data corresponding to the data segment between the source end and the destination end. Synchronize. If any of the data fragments fails to be synchronized with the task, the failover module 306 determines that the failed task supports the failover operation after determining that the failed task supports the failover operation, and the processing resource corresponding to the data fragment of the failed task is determined. The cleanup is performed, and the trigger task assignment module 302 re-allocates a task for the failed task corresponding data fragment.
  • the data synchronization module 304 starts the task thread of the reassigned task to perform offline data synchronization of the data fragment at the source end and the destination end. Until the data fragmentation is successfully synchronized, the offline data synchronization of the data set to be processed is completed.
  • each task of the data set to be processed is assigned a task, and the task thread of the task is started, and offline data synchronization of the corresponding data fragment is performed between the source end and the destination end. If it is determined that the data synchronization fails in the data synchronization process, it is determined that the failed task supports the failover operation failover, and the task level failover can be executed, and the processing resources corresponding to the data fragment of the failed task are cleaned and re-established.
  • a task is assigned to the failed task corresponding data fragment, and the task thread that starts the reassigned task performs offline data synchronization of the data fragment at the source end and the destination end. Therefore, the data fragment of the failed task is directly resynchronized, and the entire data set to be processed does not need to be reprocessed, which saves resources and improves synchronization time.
  • FIG. 4 a structural block diagram of another embodiment of a data synchronization system of the present application is shown, which may specifically include the following modules:
  • the task assignment module 402 allocates one task for each data segment of the data set to be processed; the data synchronization module 404 starts the task thread of the task, and performs offline data synchronization of the corresponding data segment between the source end and the destination end;
  • the failover module 406 is configured to: if it is determined that the failed task supports the failover operation, if the failed task supports the failover operation, the processing resource corresponding to the data fragment of the failed task is cleared; and the task assignment module 402 is triggered. Reassign a task to the data fragment corresponding to the failed task.
  • the data synchronization module 404 initiates the task of the reassigned task
  • the thread performs offline data synchronization of the data fragment at the source end and the destination end.
  • the failover module 406 includes: a support transition determination sub-module 40602 and a resource cleaning sub-module 40604.
  • the support transition determination sub-module 40602 is configured to determine that the failed task supports the failover operation when determining that the read/write feature of the destination end meets the failover condition.
  • the support transfer determination sub-module 40602 is further configured to: when the read/write feature of the destination end is a temporary synchronization feature or an idempotent feature, determine that the read/write feature of the destination end meets a failover condition; wherein the temporary synchronization feature includes: The synchronization data is written into the temporary area during the synchronization process, and after the synchronization is completed, the synchronization data of the temporary area is transferred to the fixed storage area by the operation instruction, and the synchronization data is valid; the idempotent feature includes the data write operation supporting the idempotent operation. .
  • the resource cleaning sub-module is configured to perform resource release on the task thread corresponding to the failed task, and delete statistics of the data fragment corresponding to the failed task.
  • the resource cleaning sub-module is configured to clear synchronization data stored in the data buffer corresponding to the read thread and the write thread; and cancel the occupation of the read thread and the write thread by the data fragment corresponding to the failed task.
  • the resource cleaning sub-module is further configured to stop the task data from performing offline data synchronization between the source end and the destination end.
  • the data synchronization system further includes: a failure determination module 408, configured to: when there is any abnormality information, feedback processing failure information, wherein the abnormality information includes: source abnormal information, purpose The terminal abnormality information, the network abnormality information, and the task thread abnormality information; determining, according to the processing failure information, that the abnormal situation corresponds to the task synchronization failure.
  • a failure determination module 408 configured to: when there is any abnormality information, feedback processing failure information, wherein the abnormality information includes: source abnormal information, purpose The terminal abnormality information, the network abnormality information, and the task thread abnormality information; determining, according to the processing failure information, that the abnormal situation corresponds to the task synchronization failure.
  • the task assignment module 402 allocates one task for each data segment of the data set to be processed; the data synchronization module 404 starts the task thread of the task, and performs offline data synchronization of the corresponding data segment between the source end and the destination end.
  • the failure determination module 408 is configured to: when there is any abnormality information, feedback processing failure information, wherein the abnormality information includes: source end abnormality information, destination end abnormality information, network abnormality information, and task thread abnormality information; The processing failure information determines that the abnormal situation corresponds to the task synchronization failure.
  • the failover module 406 is configured to: if it is determined that the failed task supports the failover operation, if the failed task supports the failover operation, the processing resource corresponding to the data fragment of the failed task is cleared; and the task assignment module 402 is triggered. Reassign a task to the data fragment corresponding to the failed task.
  • the data synchronization module 404 initiates the reassignment The task thread performs offline data synchronization of the data fragment at the source end and the destination end.
  • the failover may be performed, that is, the task is re-allocated for the data fragment to perform the synchronization again.
  • the task level failover is executed without resynchronizing the entire data set to be processed, thereby improving synchronization efficiency.
  • the plug-in cannot be resumed.
  • the source-side data storage in offline synchronization cannot support the location setting. If the data fragment is read synchronously, the error cannot be easily and easily obtained from the error point. The bit begins to pull the data read again. In this embodiment, the failure level of the task level is redrawn from the beginning of the source, and the problem of the point is solved.
  • the existing plug-in itself has a retrying granularity, and is generally retried for a single record or a batch submission capture exception.
  • the task level failover can resynchronize the data fragments to solve the above problem.
  • data fragments can be rescheduled to different machines, and tasks can be newly allocated to automatically restore data synchronization.
  • the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.
  • embodiments of the embodiments of the present application can be provided as a method, apparatus, or computer program product. Therefore, the embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, embodiments of the present application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • the computer device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • Memory may include computer readable media Non-permanent memory, random access memory (RAM) and/or non-volatile memory, such as read only memory (ROM) or flash memory.
  • RAM random access memory
  • ROM read only memory
  • Memory is an example of a computer readable medium.
  • Computer readable media includes both permanent and non-persistent, removable and non-removable media.
  • Information storage can be implemented by any method or technology. The information can be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory.
  • PRAM phase change memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • RAM random access memory
  • ROM read only memory
  • EEPROM electrically erasable programmable read only memory
  • flash memory or other memory technology
  • compact disk read only memory CD-ROM
  • DVD digital versatile disk
  • Magnetic tape cartridges magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device.
  • computer readable media does not include non-persistent computer readable media, such as modulated data signals and carrier waves.
  • Embodiments of the present application are described with reference to flowcharts and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG.
  • These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing terminal device to produce a machine such that instructions are executed by a processor of a computer or other programmable data processing terminal device
  • Means are provided for implementing the functions specified in one or more of the flow or in one or more blocks of the flow chart.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing terminal device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the instruction device implements the functions specified in one or more blocks of the flowchart or in a flow or block of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing terminal device such that a series of operational steps are performed on the computer or other programmable terminal device to produce computer-implemented processing, such that the computer or other programmable terminal device Instruction execution on The steps of implementing the functions specified in a block or blocks of a flow or a flow and/or a block diagram of a flowchart.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computing Systems (AREA)
  • Hardware Redundancy (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

提供了一种数据同步方法和系统,以解决数据同步出现的同步失败问题。所述的方法包括:为待处理数据集的每个数据分片分别分配一个任务(102);启动所述任务的任务线程,在源端和目的端之间执行对应数据分片的离线数据同步(104);当判断任一数据分片对应任务同步失败后,若确定失败任务支持失效转移操作,则对所述失败任务对应数据分片的处理资源进行清理(106);重新为所述失败任务对应数据分片分配一个任务,启动重新分配的任务的任务线程执行所述数据分片在源端和目的端的离线数据同步(108)。直接对失败任务的数据分片重新同步,无需对整个待处理数据集进行重处理,节省资源且提高同步时间。

Description

一种数据同步方法和系统 技术领域
本申请涉及数据处理技术领域,特别是涉及一种数据同步方法和一种数据同步系统。
背景技术
随着网络技术的发展,各种不同数据库或文件系统之间的交互越来越多,但是数据库和文件系统的种类很多,因此往往存在不同类型数据库/文件系统之间数据的读写。
在多种不同类型数据库/文件系统之间执行数据的读写,即数据的导入、导出时,有时需要执行离线同步。而离线同步周期长,同步过程严重依赖于源端,执行途径(gateway),以及目的端等的稳定性。在同步过程中可以将一个任务分成多个任务分片进行处理,但是,若某个分片的同步失败,则会导致整个任务的失败,且其余分片的同步成果无法保留。
若出现上述分片同步的失败问题,往往需要对整个任务进行重处理,浪费资源且影响操作时间。
因此,目前需要本领域技术人员迫切解决的一个技术问题就是:提出一种数据同步方法和系统,以解决上述数据同步出现的同步失败问题。
发明内容
本申请实施例所要解决的技术问题是提供一种数据同步方法,以解决上述数据同步出现的同步失败问题。
相应的,本申请实施例还提供了一种数据同步系统,用以保证上述方法的实现及应用。
为了解决上述问题,本申请公开了一种数据同步方法,包括:为待处理数据集的每个数据分片分别分配一个任务;启动所述任务的任务线程,在源端和目的端之间执行对应数据分片的离线数据同步;当判断任一数据分片对应任务同步失败后,若确定失败任务支持失效转移操作,则对所述失败任务 对应数据分片的处理资源进行清理;重新为所述失败任务对应数据分片分配一个任务,启动重新分配的任务的任务线程执行所述数据分片在源端和目的端的离线数据同步。
可选的,所述确定失败的任务支持失效转移操作,包括:当判断目的端的读写特征符合失效转移条件时,确定失败任务支持失效转移操作。
可选的,还包括:当所述目的端的读写特征为临时同步特征或幂等特征时,判断目的端的读写特征符合失效转移条件;其中,所述临时同步特征包括:同步过程中将同步数据写入临时区,同步完成后,通过操作指令将临时区的同步数据转移到固定存储区后同步数据生效的特征;所述幂等特征包括数据写入操作支持幂等操作。
可选的,对所述失败任务对应数据分片的处理资源进行清理,包括:对所述失败任务对应任务线程进行资源释放,以及删除所述失败任务对应数据分片的统计数据。
可选的,所述任务线程包括读线程和写线程;对所述失败任务对应任务线程进行资源释放,包括:清除所述读线程和写线程对应数据缓冲区内存储的同步数据;撤销所述失败的数据分片对所述读线程和写线程的占用。
可选的,对所述失败任务对应数据分片的处理资源进行清理之前,还包括:任务线程停止在源端和目的端之间执行离线数据同步。
可选的,还包括:当存在任一异常信息时,反馈处理失败信息,其中,所述异常信息包括:源端异常信息、目的端异常信息、网络异常信息和任务线程异常信息;依据所述处理失败信息判断所述异常情况对应任务同步失败。
本申请实施例还公开了一种数据同步系统,包括:任务分配模块,用于为待处理数据集的每个数据分片分别分配一个任务;以及重新为失败任务对应数据分片分配一个任务;数据同步模块,用于启动所述任务的任务线程,在源端和目的端之间执行对应数据分片的离线数据同步;以及启动重新分配的任务的任务线程执行所述数据分片在源端和目的端的离线数据同步;失效转移模块,用于当判断任一数据分片对应任务同步失败后,若确定失败任务 支持失效转移操作,则对所述失败任务对应数据分片的处理资源进行清理;触发任务分配模块。
可选的,所述失效转移模块,包括:支持转移判断子模块,用于当判断目的端的读写特征符合失效转移条件时,确定失败任务支持失效转移操作。
可选的,所述支持转移判断子模块,还用于当所述目的端的读写特征为临时同步特征或幂等特征时,判断目的端的读写特征符合失效转移条件;其中,所述临时同步特征包括:同步过程中将同步数据写入临时区,同步完成后,通过操作指令将临时区的同步数据转移到固定存储区后同步数据生效的特征;所述幂等特征包括数据写入操作支持幂等操作。
可选的,所述失效转移模块,包括:资源清理子模块,用于对所述失败任务对应任务线程进行资源释放,以及删除所述失败任务对应数据分片的统计数据。
可选的,所述资源清理子模块,用于清除所述读线程和写线程对应数据缓冲区内存储的同步数据;撤销所述失败任务对应数据分片对所述读线程和写线程的占用。
可选的,所述资源清理子模块,还用于任务线程停止在源端和目的端之间执行离线数据同步。
可选的,还包括:失败确定模块,用于当存在任一异常信息时,反馈处理失败信息,其中,所述异常信息包括:源端异常信息、目的端异常信息、网络异常信息和任务线程异常信息;依据所述处理失败信息判断所述异常情况对应任务同步失败。
与现有技术相比,本申请实施例包括以下优点:
在本申请实施例中,为待处理数据集的每个数据分片分别分配一个task,启动所述task的任务线程,在源端和目的端之间执行对应数据分片的离线数据同步。若在数据同步过程中判断任一数据分片对应task同步失败,确定失败task支持失效转移操作failover,即可执行task级别的failover,对所述失败task对应数据分片的处理资源进行清理,重新为所述失败task对应 数据分片分配一个task,启动重新分配的task的任务线程执行所述数据分片在源端和目的端的离线数据同步。从而直接对失败task的数据分片重新同步,无需对整个待处理数据集进行重处理,节省资源且提高同步时间。
附图说明
图1是本申请的一种数据同步方法实施例的步骤流程图;
图2是本申请的另一种数据同步方法实施例的步骤流程图;
图3是本申请一种数据同步系统实施例的结构框图;
图4是本申请另一种数据同步系统实施例的结构框图。
具体实施方式
为使本申请的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本申请作进一步详细的说明。
本申请实施例的核心构思之一在于,提出一种数据同步方法和系统,以解决上述数据同步出现的同步失败问题。为待处理数据集的每个数据分片分别分配一个task,启动所述task的任务线程,在源端和目的端之间执行对应数据分片的离线数据同步。若在数据同步过程中判断任一数据分片对应task同步失败,确定失败task支持失效转移操作failover,即可执行task级别的failover,对所述失败task对应数据分片的处理资源进行清理,重新为所述失败task对应数据分片分配一个task,启动重新分配的task的任务线程执行所述数据分片在源端和目的端的离线数据同步。从而在数据分片同步失败时,执行task级别的failover,即重新对失败的数据分片进行同步,无需对整个待处理数据集进行重处理,节省资源且提高同步时间。
实施例一
参照图1,示出了本申请的一种数据同步方法实施例的步骤流程图,具体可以包括如下步骤:
步骤102,为待处理数据集的每个数据分片分别分配一个任务。
步骤104,启动所述任务的任务线程,在源端和目的端之间执行对应数据分片的离线数据同步。
在不同数据库/文件系统之间进行离线数据同步时,将要进行同步的数据集作为待处理数据集,可以设置离线数据同步的源端和目的端,将待处理数据集所在的数据库/文件系统作为源端,将待处理数据集要同步到的数据库/文件系统作为目的端。其中,待处理数据集可以看作是业务数据的集合,其中包括大量业务数据,因此本实施例在对待处理数据集进行离线同步前,可以先将待处理数据集分成若干个数据分片。执行数据同步的主线程建立了多个任务,为每个任务分配一个数据分片,即每个数据分片对应一个任务(task),每个任务对应相应的任务线程,从而针对待处理数据集的每个数据分片,分别启动相应的任务线程,同步采用多个任务线程在源端和目的端之间执行离线数据同步,即在源端和目的端之间进行数据的读、写操作。
步骤106,当判断任一数据分片对应任务同步失败后,若确定失败任务支持失效转移操作,则对所述失败任务对应数据分片的处理资源进行清理。
步骤108,重新为所述失败任务对应数据分片分配一个任务,启动重新分配的任务的任务线程执行所述数据分片在源端和目的端的离线数据同步。
在任务线程对数据分片进行离线数据同步的过程中,可能由于各种原因导致任务对数据分片的同步失败,如网络不稳定,目的端写入数据时间超时等。因此当判断任一数据分片对应任务同步失败后,针对该失败任务,确定该失败任务是否支持失效转移(failover)操作。其中,failover指的是计算机领域的数据库、应用服务、硬件设备等的失效转移,是一种备份操作模式,当任务由于某些原因处理失败时可以转移到其他组件(如节点、进程、线程)上进行重处理。本实施例可以依据目的端的属性确定失败的数据分片是否支持失效转移操作。
在确定出同步失败的任务,且该失败任务支持失效转移操作,可以将该失败任务对应数据分片转移到其他任务线程上进行重处理,在转移该失败任务对应数据分片之前,为避免一个数据分片同时被两个任务线程处理,可以先对所述失败任务对应数据分片对应处理资源进行清理,如释放该数据分片 对任务线程的占用等。
在失败的数据分片对应处理资源清理完毕后,可以重新为所述失败的数据分片分配一个任务线程,然后重新分配的任务线程,执行所述数据分片在源端和目的端的离线数据同步,即在源端和目的端之间进行离线数据的读、写操作。
综上所述,为待处理数据集的每个数据分片分别分配一个task,启动所述task的任务线程,在源端和目的端之间执行对应数据分片的离线数据同步。若在数据同步过程中判断任一数据分片对应task同步失败,确定失败task支持失效转移操作failover,即可执行task级别的failover,对所述失败task对应数据分片的处理资源进行清理,重新为所述失败task对应数据分片分配一个task,启动重新分配的task的任务线程执行所述数据分片在源端和目的端的离线数据同步。从而直接对失败task的数据分片重新同步,无需对整个待处理数据集进行重处理,节省资源且提高同步时间。
实施例二
在上述实施例的基础上,本实施例详细论述基于failover的离线数据同步操作。
本申请实施例的离线数据同步可以应用于datax的离线同步中,datax是一个在异构的数据库/文件系统之间高速交换数据的工具,实现了在任意的数据处理系统(如RDBMS/Hdfs/Local filesystem)之间的数据交换。
DataX具有如下特征:在异构的数据库/文件系统之间高速交换数据;采用Framework+plugin架构构建,Framework处理了缓冲,流控,并发,上下文加载等高速数据交换的大部分技术问题,提供了简单的接口与插件交互,插件仅需实现对数据处理系统的访问;运行模式为stand-alone;数据传输过程在单进程内完成,全内存操作,不读写磁盘,也没有IPC;开放式的框架,开发者可以在极短的时间开发一个新插件以快速支持新的数据库/文件系统。因此,以datax执行离线同步同步为例详细论述离线数据同步操作。
参照图2,示出了本申请的另一种数据同步方法实施例的步骤流程图, 具体可以包括如下步骤:
步骤202,获取待处理数据集,对待处理数据集进行切分得到数据分片。
步骤204,为数据分片分配task。
步骤206,主线程启动各任务组(taskGroup),taskGroup启动各自的task。
步骤208,task的任务线程在源端和目的端之间执行离线数据同步。
在源端和目的端进行离线数据同步前,首先确定待处理数据集,为了提高离线同步的效率,将待处理数据集切分成若干数据分片。本实施例中,执行离线数据同步的主进程建立了多个任务组,每个任务组下建立了多个任务,因此可以采用任务组taskGroup的方式执行离线数据同步,即将为每个数据分片分配一个任务,采用该任务的任务线程进行同步处理。在完成数据分片的分配后,主线程启动各taskGroup,各taskGroup再启动各自的task,task的任务线程在源端和目的端之间执行离线数据同步。
其中,任务线程包括读线程和写线程,读线程用于读取数据,写线程用于写入数据,主线程还为每个任务分配了数据缓冲区,用于暂存读、写数据。因此在进行离线数据同步时,在源端和目的端之间通过读线程和写线程分别执行数据的读取和写入,并且可以将数据暂存在数据缓冲区,从而实现离线数据同步。
步骤210,task向taskGroup反馈状态信息。
步骤212,依据状态信息确定task是否同步失败。
任务在对数据分片进行数据同步时,会收集自己的状态信息反馈给taskGroup,其中,状态信息包括对数据分片的离线数据同步的处理结果,即task会告知taskGroup离线同步是否处理成功,若处理成功可以反馈处理成功消息,若处理失败,反馈处理失败消息。从而依据状态信息中的处理失败信息可以确定处理失败。
本申请实施例中,当存在任一异常信息时,反馈处理失败信息,其中,所述异常信息包括:源端异常信息、目的端异常信息、网络异常信息和任务线程异常信息;依据所述处理失败信息判断所述异常情况对应任务同步失败。
在源端和目的端进行离线数据同步时,整个同步链条的任何一个环节出错都可能导致task失败,因此当由以下任一情况导致出现异常信息时,都将会生成相应的处理失败信息,异常信息包括:
源端异常信息,即由源端异常生成的异常信息,如数据源抖动不可用。
目的端异常信息,即由目的端异常生成的异常信息,如目的端写入慢导致源端连接超时被关闭。
网络异常信息,即由网络异常生成的异常信息,如网络中断。
任务线程异常信息,即由任务线程异常生成的异常信息,如线程出错等。
在出现上述异常时task会生成相应的处理失败信息,task将处理失败信息添加到状态信息中反馈给taskGroup,taskGroup基于处理失败信息判断所述异常情况对应任务同步失败。
若是,即依据状态信息确定同步失败,执行步骤214;若否,即依据状态信息确定同步成功,执行步骤220。
步骤214,依据目的端的读写特征,判断失败task是否支持failover。
本实施例中,对于支持failover的失败task,可以重新对失败task对应数据分片进行同步,即支持同步失败数据分片的重处理,从而无需对整个待处理数据集进行重新同步,节省资源和同步时间。
Task失败后能够执行failover取决于目的端的读写特征,其中,目的端的读写特征为临时同步特征或幂等特征时,可以判断目的端的读写特征符合失效转移条件,即失败task支持failover。
1)临时同步特征。
临时同步特征为同步过程中将同步数据写入临时区,同步完成后,通过操作指令将临时区的同步数据转移到固定存储区后同步数据生效的特征。
即在执行源端和目的端的数据同步时,目的端写入同步数据时是先将同步数据写入临时区(即临时缓冲区)进行缓存,当一个数据分片同步完成,目的端会发送操作执行即commit指令,然后依据commit指令将临时区的同步数据转移(move)到实际生产区即一个固定存储区中,转移完成后同步数据生效。
针对具有上述特征的目的端,如果task失败,目的端未发送commit指令,就可以执行failover即重新初始化一个task往新的临时区同步数据,而无需理会失败task对应临时区的同步数据,因为目的端会自动清理失败task对应临时区的同步数据,不会应用到生产并生效。因此若数据同步到具有临时同步特征的目的端,对应失败任务能够支持failover。
2)幂等特征。
幂等特征为数据写入操作支持幂等操作的特征。即目的端的同步数据写入是幂等的,即任意多次执行所产生的影响均与一次执行的影响相同,即在数据同步过程中若执行多次写入,则后写入的数据会覆盖掉之前数据,而不会数据重复的问题。若目的端具有幂等特征,则相应task支持failover。
将上述离线同步应用到datax中,当task失败时,准确判断该task是否能failover,对不同的插件判断标准不同。当目的端是odpswriter或mysqlwirter系统时,其写入模式是replace模式,即写入操作是幂等的,因此支持failover;又如目的端是tairwriter的put模式也可实行task failover。对于目的端为odpsWriter来说,同步数据过程中未反馈commit指令,由于其写入的同步数据在临时区,即数据未生效,则可以执行failover。
因此在依据目的端的写入特征判断失败task是否支持failover时,可以在task的writer实现supportFailover方法,根据当前目的端的写入特征以及同步进度返回true或false来告知taskGroup,该task是否支持failover。若是,即判断失败task支持failover,执行步骤216;若否,判断失败task不支持failover,则返回步骤204,重新对待处理数据集进行同步。
步骤216,对所述失败任务对应任务线程进行资源释放,以及删除所述失败任务对应数据分片的统计数据。
taskGroup发现task失败,且判断失败task支持failover,则中断失败task的任务线程,清空统计数据。可以将所述失败任务对应任务线程进行资源释放,即停止失败任务对应任务线程停止对外部的读、写操作,并且删除所述失败任务对应数据分片的统计数据,如清除数据分片的同步记录数,同步数据量等。
本申请一个可选实施例中,对所述失败任务对应任务线程进行资源释放,包括:清除所述读线程和写线程对应数据缓冲区内存储的同步数据;撤销所述失败任务对应数据分片对所述读线程和写线程的占用。
任务线程采用读线程执行同步数据的读取操作,采用写线程执行同步数据的写入操作,在对任务线程进行资源释放时,可以停止读线程和写线程当前的读、写操作,清除所述读线程和写线程对应数据缓冲区内存储的同步数据,然后撤销所述失败任务对应数据分片对所述读线程和写线程的占用,使得该数据分片不再由该任务线程处理。
步骤218,确定失败任务对应数据分片的处理资源是否清理完毕。
实际处理中,当tas同步失败时,需要将该task的处理资源都释放,以确保执行failover时,重新分配的task执行同步时,之前失败task已终止,确保不会同时有两个task在对一个数据分片进行处理,且清空统计数据后,重新分配的task执行同步时能够重新进行数据的统计。因此要确保失败task资源已全部释放,保证最终目的端写入的数据不丢不重。可以通过中断失败task的读、写线程,以及将读写线程均要操作的内存通道设置为失效等方式清理资源,taskGroup只有确认失败task已完全停止才会重新为数据分片分配task,并启动重新分配的task执行数据同步。
因此,失败task在完成处理资源的清理后,会向TaskGroup汇报自己的读、写线程是否已结束,内存资源已释放,因此TaskGroup会基于失败task的反馈确定处理资源是否清理完毕。
若是,即处理资源清理完毕,执行步骤204;若否,即处理资源未清理完毕,执行步骤216,继续清理资源。
在处理资源清理完毕后,可以对失败task执行failover,因此返回步骤204,为失败task对应数据分片重新分配一个task,采用重新分配的task对同步失败的数据分片进行数据同步,直到数据同步成功,结束该任务。
步骤220,该task的数据同步成功,结束task。
依据状态信息确定同步成功该task的数据同步成功,结束task
从而在数据分片对应任务处理失败时,基于目的端的读写特征确定失败 task支持failover后,可以执行failover,即重新为数据分片分配task以重新执行同步。从而执行task级别的failover,而无需对整个待处理数据集进行重新同步,提高同步效率。
对于离线同步中插件无法断点续传的问题,如典型的关系数据库,离线同步中源端数据存储不能够支持位点设置,如果数据分片同步读到中间出错,无法简单方便的从出错点位开始重新拉取数据读取。而本实施例采用task级别的failover是从源头开始处重新拉取,解决了点位问题。
对于离线同步中插件自身重试没有覆盖的问题,现有插件本身的重试粒度较细,一般针对单条记录或一次批提交捕获异常做重试。而整个task生命周期里,由于操作步骤较多,很可能存在遗漏点导致没有做重试的情况,应用task级别的failover可以对数据分片重新同步,解决上述问题。
采用task级别failvoer,可以将数据分片重新调度到不同的机器上,从新分配task,自动恢复数据同步。
需要说明的是,对于方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请实施例并不受所描述的动作顺序的限制,因为依据本申请实施例,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作并不一定是本申请实施例所必须的。
实施例三
在上述实施例的基础上,本实施例还公开了一种数据同步系统。
参照图3,示出了本申请一种数据同步系统实施例的结构框图,具体可以包括如下模块:
任务分配模块302,用于为待处理数据集的每个数据分片分别分配一个任务;以及重新为失败任务对应数据分片分配一个任务。
数据同步模块304,用于启动所述任务的任务线程,在源端和目的端之间执行对应数据分片的离线数据同步;以及启动重新分配的任务的任务线程执行所述数据分片在源端和目的端的离线数据同步。
失效转移模块306,用于当判断任一数据分片对应任务同步失败后,若确定失败任务支持失效转移操作,则对所述失败任务对应数据分片的处理资源进行清理;触发任务分配模块。
即任务分配模块302为待处理数据集的每个数据分片分别分配一个任务,而后数据同步模块304启动所述任务的任务线程,在源端和目的端之间执行对应数据分片的离线数据同步。若任一数据分片对应任务同步失败,失效转移模块306当判断任一数据分片对应任务同步失败后,若确定失败任务支持失效转移操作,则对所述失败任务对应数据分片的处理资源进行清理,触发任务分配模块302重新为失败任务对应数据分片分配一个任务,数据同步模块304启动重新分配的任务的任务线程执行所述数据分片在源端和目的端的离线数据同步。直到数据分片同步成功,完成对待处理数据集的离线数据同步。
综上,为待处理数据集的每个数据分片分别分配一个task,启动所述task的任务线程,在源端和目的端之间执行对应数据分片的离线数据同步。若在数据同步过程中判断任一数据分片对应task同步失败,确定失败task支持失效转移操作failover,即可执行task级别的failover,对所述失败task对应数据分片的处理资源进行清理,重新为所述失败task对应数据分片分配一个task,启动重新分配的task的任务线程执行所述数据分片在源端和目的端的离线数据同步。从而直接对失败task的数据分片重新同步,无需对整个待处理数据集进行重处理,节省资源且提高同步时间。
参照图4,示出了本申请另一种数据同步系统实施例的结构框图,具体可以包括如下模块:
任务分配模块402为待处理数据集的每个数据分片分别分配一个任务;数据同步模块404启动所述任务的任务线程,在源端和目的端之间执行对应数据分片的离线数据同步;失效转移模块406,用于当判断任一数据分片对应任务同步失败后,若确定失败任务支持失效转移操作,则对所述失败任务对应数据分片的处理资源进行清理;触发任务分配模块402重新为失败任务对应数据分片分配一个任务。数据同步模块404启动重新分配的任务的任务 线程执行所述数据分片在源端和目的端的离线数据同步。
本申请一个可选实施例中,所述失效转移模块406,包括:支持转移判断子模块40602和资源清理子模块40604。
支持转移判断子模块40602,用于当判断目的端的读写特征符合失效转移条件时,确定失败任务支持失效转移操作。所述支持转移判断子模块40602,还用于当所述目的端的读写特征为临时同步特征或幂等特征时,判断目的端的读写特征符合失效转移条件;其中,所述临时同步特征包括:同步过程中将同步数据写入临时区,同步完成后,通过操作指令将临时区的同步数据转移到固定存储区后同步数据生效的特征;所述幂等特征包括数据写入操作支持幂等操作。
资源清理子模块,用于对所述失败任务对应任务线程进行资源释放,以及删除所述失败任务对应数据分片的统计数据。所述资源清理子模块,用于清除所述读线程和写线程对应数据缓冲区内存储的同步数据;撤销所述失败任务对应数据分片对所述读线程和写线程的占用。所述资源清理子模块,还用于任务线程停止在源端和目的端之间执行离线数据同步。
本申请另一个可选实施例中,数据同步系统还包括:失败确定模块408,用于当存在任一异常信息时,反馈处理失败信息,其中,所述异常信息包括:源端异常信息、目的端异常信息、网络异常信息和任务线程异常信息;依据所述处理失败信息判断所述异常情况对应任务同步失败。
即任务分配模块402为待处理数据集的每个数据分片分别分配一个任务;数据同步模块404启动所述任务的任务线程,在源端和目的端之间执行对应数据分片的离线数据同步;失败确定模块408,用于当存在任一异常信息时,反馈处理失败信息,其中,所述异常信息包括:源端异常信息、目的端异常信息、网络异常信息和任务线程异常信息;依据所述处理失败信息判断所述异常情况对应任务同步失败。失效转移模块406,用于当判断任一数据分片对应任务同步失败后,若确定失败任务支持失效转移操作,则对所述失败任务对应数据分片的处理资源进行清理;触发任务分配模块402重新为失败任务对应数据分片分配一个任务。数据同步模块404启动重新分配的任 务的任务线程执行所述数据分片在源端和目的端的离线数据同步。
从而在数据分片对应任务处理失败时,基于目的端的读写特征确定失败task支持failover后,可以执行failover,即重新为数据分片分配task以重新执行同步。从而执行task级别的failover,而无需对整个待处理数据集进行重新同步,提高同步效率。
对于离线同步中插件无法断点续传的问题,如典型的关系数据库,离线同步中源端数据存储不能够支持位点设置,如果数据分片同步读到中间出错,无法简单方便的从出错点位开始重新拉取数据读取。而本实施例采用task级别的failover是从源头开始处重新拉取,解决了点位问题。
对于离线同步中插件自身重试没有覆盖的问题,现有插件本身的重试粒度较细,一般针对单条记录或一次批提交捕获异常做重试。而整个task生命周期里,由于操作步骤较多,很可能存在遗漏点导致没有做重试的情况,应用task级别的failover可以对数据分片重新同步,解决上述问题。
采用task级别failvoer,可以将数据分片重新调度到不同的机器上,从新分配task,自动恢复数据同步。
对于装置实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。
本领域内的技术人员应明白,本申请实施例的实施例可提供为方法、装置、或计算机程序产品。因此,本申请实施例可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
在一个典型的配置中,所述计算机设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。内存可能包括计算机可读介质 中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括非持续性的电脑可读媒体(transitory media),如调制的数据信号和载波。
本申请实施例是参照根据本申请实施例的方法、终端设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理终端设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理终端设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理终端设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理终端设备上,使得在计算机或其他可编程终端设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程终端设备上执行的指令提供用 于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本申请实施例的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本申请实施例范围的所有变更和修改。
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者终端设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者终端设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者终端设备中还存在另外的相同要素。
以上对本申请所提供的一种数据同步方法和一种数据同步系统,进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (14)

  1. 一种数据同步方法,其特征在于,包括:
    为待处理数据集的每个数据分片分别分配一个任务;
    启动所述任务的任务线程,在源端和目的端之间执行对应数据分片的离线数据同步;
    当判断任一数据分片对应任务同步失败后,若确定失败任务支持失效转移操作,则对所述失败任务对应数据分片的处理资源进行清理;
    重新为所述失败任务对应数据分片分配一个任务,启动重新分配的任务的任务线程执行所述数据分片在源端和目的端的离线数据同步。
  2. 根据权利要求1所述的方法,其特征在于,所述确定失败的任务支持失效转移操作,包括:
    当判断目的端的读写特征符合失效转移条件时,确定失败任务支持失效转移操作。
  3. 根据权利要求2所述的方法,其特征在于,还包括:
    当所述目的端的读写特征为临时同步特征或幂等特征时,判断目的端的读写特征符合失效转移条件;
    其中,所述临时同步特征包括:同步过程中将同步数据写入临时区,同步完成后,通过操作指令将临时区的同步数据转移到固定存储区后同步数据生效的特征;
    所述幂等特征包括数据写入操作支持幂等操作。
  4. 根据权利要求1所述的方法,其特征在于,对所述失败任务对应数据分片的处理资源进行清理,包括:
    对所述失败任务对应任务线程进行资源释放,以及删除所述失败任务对应数据分片的统计数据。
  5. 根据权利要求4所述的方法,其特征在于,所述任务线程包括读线程和写线程;对所述失败任务对应任务线程进行资源释放,包括:
    清除所述读线程和写线程对应数据缓冲区内存储的同步数据;
    撤销所述失败任务对应数据分片对所述读线程和写线程的占用。
  6. 根据权利要求4所述的方法,其特征在于,对所述失败任务对应数据分片的处理资源进行清理之前,还包括:
    任务线程停止在源端和目的端之间执行离线数据同步。
  7. 根据权利要求1所述的方法,其特征在于,还包括:
    当存在任一异常信息时,反馈处理失败信息,其中,所述异常信息包括:源端异常信息、目的端异常信息、网络异常信息和任务线程异常信息;
    依据所述处理失败信息判断所述异常情况对应任务同步失败。
  8. 一种数据同步系统,其特征在于,包括:
    任务分配模块,用于为待处理数据集的每个数据分片分别分配一个任务;以及重新为失败任务对应数据分片分配一个任务;
    数据同步模块,用于启动所述任务的任务线程,在源端和目的端之间执行对应数据分片的离线数据同步;以及启动重新分配的任务的任务线程执行所述数据分片在源端和目的端的离线数据同步;
    失效转移模块,用于当判断任一数据分片对应任务同步失败后,若确定失败任务支持失效转移操作,则对所述失败任务对应数据分片的处理资源进行清理;触发任务分配模块。
  9. 根据权利要求8所述的系统,其特征在于,所述失效转移模块,包括:
    支持转移判断子模块,用于当判断目的端的读写特征符合失效转移条件时,确定失败任务支持失效转移操作。
  10. 根据权利要求9所述的系统,其特征在于,
    所述支持转移判断子模块,还用于当所述目的端的读写特征为临时同步特征或幂等特征时,判断目的端的读写特征符合失效转移条件;其中,所述临时同步特征包括:同步过程中将同步数据写入临时区,同步完成后,通过操作指令将临时区的同步数据转移到固定存储区后同步数据生效的特征;所述幂等特征包括数据写入操作支持幂等操作。
  11. 根据权利要求8所述的系统,其特征在于,所述失效转移模块,包 括:
    资源清理子模块,用于对所述失败任务对应任务线程进行资源释放,以及删除所述失败任务对应数据分片的统计数据。
  12. 根据权利要求11所述的系统,其特征在于,
    所述资源清理子模块,用于清除所述读线程和写线程对应数据缓冲区内存储的同步数据;撤销所述失败任务对应数据分片对所述读线程和写线程的占用。
  13. 根据权利要求11所述的系统,其特征在于,
    所述资源清理子模块,还用于任务线程停止在源端和目的端之间执行离线数据同步。
  14. 根据权利要求8所述的系统,其特征在于,还包括:
    失败确定模块,用于当存在任一异常信息时,反馈处理失败信息,其中,所述异常信息包括:源端异常信息、目的端异常信息、网络异常信息和任务线程异常信息;依据所述处理失败信息判断所述异常情况对应任务同步失败。
PCT/CN2016/098960 2015-09-24 2016-09-14 一种数据同步方法和系统 Ceased WO2017050165A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP16848047.3A EP3355189A4 (en) 2015-09-24 2016-09-14 Data synchronization method and system
JP2018512556A JP6832917B2 (ja) 2015-09-24 2016-09-14 データ同期の方法及びシステム
US15/936,313 US20180218058A1 (en) 2015-09-24 2018-03-26 Data synchronization method and system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510617820.XA CN106557364A (zh) 2015-09-24 2015-09-24 一种数据同步方法和系统
CN201510617820.X 2015-09-24

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/936,313 Continuation US20180218058A1 (en) 2015-09-24 2018-03-26 Data synchronization method and system

Publications (1)

Publication Number Publication Date
WO2017050165A1 true WO2017050165A1 (zh) 2017-03-30

Family

ID=58385600

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/098960 Ceased WO2017050165A1 (zh) 2015-09-24 2016-09-14 一种数据同步方法和系统

Country Status (5)

Country Link
US (1) US20180218058A1 (zh)
EP (1) EP3355189A4 (zh)
JP (1) JP6832917B2 (zh)
CN (1) CN106557364A (zh)
WO (1) WO2017050165A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113742415A (zh) * 2020-05-29 2021-12-03 北京金山云网络技术有限公司 增量数据同步方法、装置及电子设备

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108694199A (zh) * 2017-04-10 2018-10-23 北京京东尚科信息技术有限公司 数据同步装置、方法、存储介质及电子设备
US11249824B2 (en) * 2017-04-25 2022-02-15 Red Hat, Inc. Balancing a recurring task between multiple worker processes
CN109116818B (zh) * 2018-08-08 2020-03-17 新智能源系统控制有限责任公司 一种scada系统升级时的实时数据转储方法和装置
CN109614442B (zh) * 2018-11-02 2020-12-25 东软集团股份有限公司 数据同步的数据表维护方法、装置、存储介质和电子设备
CN109933596A (zh) * 2019-02-27 2019-06-25 深圳市轱辘汽车维修技术有限公司 一种数据同步方法、装置及终端设备
CN111767318A (zh) * 2019-04-01 2020-10-13 广州精选速购网络科技有限公司 一种数据统计方法、装置、电子设备及介质
CN118312566A (zh) * 2019-04-12 2024-07-09 创新先进技术有限公司 一种数据同步方法和装置
CN110334082A (zh) * 2019-07-11 2019-10-15 珠海格力电器股份有限公司 一种数据库的无损迁移方法及装置
CN110417901B (zh) * 2019-07-31 2022-04-29 北京金山云网络技术有限公司 数据处理方法、装置及网关服务器
CN111343274A (zh) * 2020-02-28 2020-06-26 国铁吉讯科技有限公司 数据同步的交互方法
CN114126035A (zh) * 2021-11-29 2022-03-01 云知声智能科技股份有限公司 一种时间同步的方法、装置、终端及存储介质
CN114840393B (zh) * 2022-06-29 2022-09-30 杭州比智科技有限公司 一种多数据源数据同步监控方法及系统
CN115017235B (zh) * 2022-06-30 2023-07-14 上海弘玑信息技术有限公司 数据同步方法及电子设备、存储介质
CN115883535A (zh) * 2022-11-28 2023-03-31 南京南瑞信息通信科技有限公司 电力终端设备的文件切片传输方法、系统、装置及介质
CN116567007B (zh) * 2023-07-10 2023-10-13 长江信达软件技术(武汉)有限责任公司 一种基于任务切分的微服务水利数据共享交换方法
CN117149912A (zh) * 2023-09-15 2023-12-01 广域铭岛数字科技有限公司 数据库间的数据同步方法、系统及电子设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20040051132A (ko) * 2002-12-12 2004-06-18 엘지전자 주식회사 다중 사용자 환경 지원 핸드폰
CN101958919A (zh) * 2009-07-20 2011-01-26 新奥特(北京)视频技术有限公司 一种基于非ip数据隧道的多文件并行传输方法及系统
CN102790771A (zh) * 2012-07-25 2012-11-21 山东中创软件商用中间件股份有限公司 一种文件传输方法及系统
CN103092712A (zh) * 2011-11-04 2013-05-08 阿里巴巴集团控股有限公司 一种任务中断恢复方法和设备
CN103150236A (zh) * 2013-03-25 2013-06-12 中国人民解放军国防科学技术大学 面向进程失效错误的并行通信库状态自恢复方法

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6526447B1 (en) * 1999-12-14 2003-02-25 International Business Machines Corporation Apparatus for restarting interrupted data transfer and method therefor
CN101018113A (zh) * 2007-01-24 2007-08-15 华为技术有限公司 实现数据同步、获知数据同步结果方法及其系统、hlr
CN101132269B (zh) * 2007-07-26 2010-06-23 中兴通讯股份有限公司 数据同步方法及使用该方法的iptv内容分发网络系统
CN101166309B (zh) * 2007-08-10 2010-06-23 中兴通讯股份有限公司 一种双归属系统中实现用户数据同步的方法
CN101419616A (zh) * 2008-12-10 2009-04-29 阿里巴巴集团控股有限公司 一种数据同步方法及装置
US20120311161A1 (en) * 2011-06-03 2012-12-06 Apple Inc. Dual-phase content synchronization
TWI439873B (zh) * 2011-08-08 2014-06-01 Dimerco Express Taiwan Corp Data synchronization method
US9842053B2 (en) * 2013-03-15 2017-12-12 Sandisk Technologies Llc Systems and methods for persistent cache logging
US9659078B2 (en) * 2013-08-29 2017-05-23 Oracle International Corporation System and method for supporting failover during synchronization between clusters in a distributed data grid
CN103686300B (zh) * 2013-11-18 2017-07-18 中兴通讯股份有限公司 业务指南的同步方法及系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20040051132A (ko) * 2002-12-12 2004-06-18 엘지전자 주식회사 다중 사용자 환경 지원 핸드폰
CN101958919A (zh) * 2009-07-20 2011-01-26 新奥特(北京)视频技术有限公司 一种基于非ip数据隧道的多文件并行传输方法及系统
CN103092712A (zh) * 2011-11-04 2013-05-08 阿里巴巴集团控股有限公司 一种任务中断恢复方法和设备
CN102790771A (zh) * 2012-07-25 2012-11-21 山东中创软件商用中间件股份有限公司 一种文件传输方法及系统
CN103150236A (zh) * 2013-03-25 2013-06-12 中国人民解放军国防科学技术大学 面向进程失效错误的并行通信库状态自恢复方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3355189A4 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113742415A (zh) * 2020-05-29 2021-12-03 北京金山云网络技术有限公司 增量数据同步方法、装置及电子设备

Also Published As

Publication number Publication date
JP6832917B2 (ja) 2021-02-24
EP3355189A4 (en) 2018-10-10
CN106557364A (zh) 2017-04-05
JP2018530060A (ja) 2018-10-11
US20180218058A1 (en) 2018-08-02
EP3355189A1 (en) 2018-08-01

Similar Documents

Publication Publication Date Title
WO2017050165A1 (zh) 一种数据同步方法和系统
CN103077222B (zh) 机群文件系统分布式元数据一致性保证方法及系统
TWI625621B (zh) 用於資料庫中進行回復的方法、電腦可用程式產品、與資料處理系統
CN109643310B (zh) 用于数据库中数据重分布的系统和方法
CN102265277B (zh) 数据存储系统的操作方法和装置
CN102629268B (zh) 数据同步方法、系统及数据访问装置
CN110134503B (zh) 一种集群环境下的定时任务处理方法、装置及存储介质
US20080209423A1 (en) Job management device, cluster system, and computer-readable medium storing job management program
CN107491351A (zh) 一种基于优先级的资源分配方法、装置和设备
WO2018177107A1 (zh) 数据迁移方法、迁移服务器及存储介质
CN103647669A (zh) 一种保证分布式数据处理一致性的系统及方法
CN103729442A (zh) 记录事务日志的方法和数据库引擎
CN112416654A (zh) 一种数据库日志重演方法、装置、设备及存储介质
CN111522648B (zh) 一种区块链的交易处理方法、装置及电子设备
WO2020025049A1 (zh) 数据同步的方法、装置、数据库主机及存储介质
WO2024260034A1 (zh) 分布式训练任务调度方法、设备及非易失性可读存储介质
CN112800026B (zh) 一种数据转移节点、方法、系统及计算机可读存储介质
CN107168777A (zh) 分布式系统中资源的调度方法以及装置
CN109284339A (zh) 一种数据库数据实时同步的方法和装置
CN106815094B (zh) 一种用于实现主备同步模式下事务提交的方法与设备
CN106874067A (zh) 基于轻量级虚拟机的并行计算方法、装置及系统
CN107179982A (zh) 一种跨进程调试方法和装置
US8977897B2 (en) Computer-readable recording medium, data management method, and storage device
CN103092955A (zh) 检查点操作方法、装置及系统
CN106776153A (zh) 作业控制方法及服务器

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16848047

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2018512556

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2016848047

Country of ref document: EP