CN120909856A

CN120909856A - High-reliability embedded processor redundancy method based on dynamic dual-mode redundancy

Info

Publication number: CN120909856A
Application number: CN202511049113.5A
Authority: CN
Inventors: 俞洋; 康玮; 王慎航; 屈辰; 李�浩; 杨智明; 于冰; 向刚; 彭宇; 林瑞仕; 禹春梅
Original assignee: Harbin Institute of Technology Shenzhen; Beijing Aerospace Automatic Control Research Institute
Current assignee: Harbin Institute of Technology Shenzhen; Beijing Aerospace Automatic Control Research Institute
Priority date: 2025-07-29
Filing date: 2025-07-29
Publication date: 2025-11-07

Abstract

The invention relates to a high-reliability embedded processor redundancy method based on dynamic dual-mode redundancy. The invention relates to the technical field of medical ultrasonic sensors, in particular to a dynamic dual-mode redundancy architecture, which breaks the limitation that a backup mode is fixed and unchanged in a traditional dual-machine hot-standby architecture and constructs a redundancy system capable of being flexibly adjusted by introducing Wen Bei processor units. The method can realize the automatic identification and replacement of the fault node in the running process of the system, and greatly improves the fault tolerance and the reconstruction flexibility of the system under the multi-node fault scene. Meanwhile, compared with triple modular redundancy and an improved scheme thereof, the framework does not need to maintain continuous parallel operation of three nodes, and is more suitable for application scenes sensitive to cost and resources.

Description

High-reliability embedded processor redundancy method based on dynamic dual-mode redundancy

Technical Field

The invention relates to the technical field of dual-computer hot standby, in particular to a high-reliability embedded processor redundancy method based on dynamic dual-mode redundancy.

Background

In the fields of industrial automation, cloud computing, communication networks and the like, the high reliability and service continuity of the system are important. With the increase of the complexity of the system, problems such as software faults, data transmission delay, multi-node faults and the like frequently occur, and the prior art is difficult to meet the severe requirements in a complex scene. Under these application scenarios, it is important to build an embedded system with high reliability. Dual hot standby redundancy is a common technique that effectively improves system reliability and maintains reliable operation under system fault conditions.

In the aspect of fault detection and processing, the dual-machine hot standby technology adopts a node level detection mechanism, and a large number of detection dead zones exist in the coarse-granularity detection mode, so that when a software process is abnormal, the software process cannot be found and processed in time, and the overall performance of the system is easily reduced and even crashed. Meanwhile, a single fault handling strategy lacks flexibility, and is difficult to adapt to diversified fault scenes. The main-standby switching mechanism is used as a key technology of dual-machine hot standby, and the traditional single-core jumper architecture has single-point fault hidden danger. Once the heartbeat link fails, misjudgment of the state of the main machine and the standby machine may be caused, so that the problem of brain cracking is caused, namely the main machine and the standby machine simultaneously consider themselves as the main machine, and data collision and service interruption are caused. In the system backup mode, the dual-machine hot-standby architecture lacks elastic reconstruction capability, and is difficult to quickly restore normal operation of the system. The triple-modular redundancy and the improvement scheme thereof can improve the fault tolerance, but have the problems of high power consumption, large volume and serious resource waste, and are not suitable for application scenes sensitive to cost and resources.

Disclosure of Invention

Aiming at the defects or improvement demands of the prior art, the invention provides a high-reliability embedded processor redundancy method based on dynamic dual-mode redundancy, which combines a daemon process to collect the running state of a processor and a dual-heartbeat chain redundancy technology and aims to improve the reliability and stability of long-time uninterrupted running of a dual-machine hot standby system.

The invention provides the following technical scheme:

The high-reliability embedded processor redundancy architecture based on dynamic dual-mode redundancy comprises a host, a hot standby machine and a warm standby machine, wherein priorities are distributed to each single machine, and the priorities are respectively the host, the hot standby machine and Wen Beiji from top to bottom;

After the system is powered on, the host and the standby enter a hot standby running state, the Wen Bei machine enters a low-power standby mode, and only the basic state monitoring module and the data receiving module are started;

the daemon process is independently started in a host computer and a standby computer respectively, a fault symptom database is initialized, and the sending period of a heartbeat signal is set to be 50ms;

The dual redundant heartbeat link completes self-checking and establishes connection, and the main machine and the standby machine synchronize initial system configuration parameters and reliability weight initial values.

The high-reliability embedded processor redundancy method based on dynamic dual-mode redundancy is operated based on a high-reliability embedded processor redundancy architecture based on dynamic dual-mode redundancy, and is characterized by comprising the following steps of:

step 1, a host computer and a standby computer execute an industrial control algorithm in real time and perform data synchronization;

step 2, state detection, wherein the dual-redundancy heartbeat links alternately transmit heartbeat signals according to a set heartbeat signal transmission period;

Step 3, three machines are synchronized, the redundant system shares data in operation timing, the host machine and the hot standby machine need to mutually send data, wen Bei machines receive data, and the data are synchronized with the operation states of other single machines in wake-up;

Step 4, synchronizing data under fault, when the host computer and the standby computer are switched into the host computer, immediately transmitting a broadcast frame with a main node switching mark, enabling a synchronization mode after Wen Bei computer receives the broadcast frame, and interacting data with a new host computer;

The main machine and the standby machine execute majority voting according to the three-machine data, and the voting result is synchronized to the three machines through the broadcast frame;

Step 5, performing inconsistent processing on the dual-computer data, and synchronizing and comparing the data of the main machine and the standby machine by using the step 3;

Step 6, host fault processing, wherein the host triggers hardware watchdog reset due to CPU core damage, and after the heat engine does not receive effective heartbeat signals of the host in 3 continuous heartbeat cycles, the host node failure is judged, and a dynamic dual-mode redundant single-machine node level fault tolerance flow is triggered;

step 7, hot standby machine fault processing, wherein the hot standby machine triggers hardware watchdog reset due to CPU core damage, and after the host machine does not receive effective heartbeat signals of the hot standby machine in continuous 3 heartbeat cycles, the hot standby machine node is judged to be invalid, and a dynamic dual-mode redundant single machine node level fault tolerance flow is triggered;

Step 8, software process fault processing;

And 9, controlling the output of the redundant system, and simultaneously receiving the data transmitted from the outside of the system by the host and the hot standby in the dynamic dual-mode redundant system, processing the data and simultaneously transmitting the data to the outside.

Preferably, the step 1 specifically includes:

And the daemon process periodically sends heartbeat signals to software processes in the host and the standby machine, collects indexes such as process response time, memory occupancy rate and the like, and updates a fault symptom database.

Preferably, the step2 specifically includes:

The dual-redundancy heartbeat link alternately transmits heartbeat signals according to the set heartbeat signal transmission period, the interval time of the two paths of heartbeat signals is 10ms, the heartbeat signal content comprises a local running state code, a reliability weight value and a check code, the receiving end sets the heartbeat signal timeout time to be three heartbeat periods, if the heartbeat signal timeout time exceeds the heartbeat signal timeout time, the node level fault tolerance flow of the dynamic dual-mode redundancy single machine is triggered, the Wen Bei machine is awakened, and the dual-machine hot standby redundancy system is reconstructed with the survival single machine;

When one heartbeat signal is received within 60ms, the heartbeat signal communication link between the two machines is not damaged by external interference, the opposite single machine node survives, and the receiving end decodes and checks the heartbeat signal and updates the opposite state table.

Preferably, the step 3 specifically includes:

the redundant system shares data at operation timing, the host and the hot standby machine mutually send data, and the Wen Bei machine receives the data so as to be capable of being quickly synchronized with the operation states of other single machines when the redundant system wakes up, wherein the host sends a control command frame and a sensor data frame once every 10ms, and a frame ID contains a host identification, and the hot standby machine monitor and analyze in real time;

After the hot standby machine receives the request, comparing the local operation result with synchronous data, and if the local operation result is consistent with the synchronous data, transmitting a confirmation frame containing self state weight every 20 ms;

The warm standby machine is in a low-power consumption monitoring mode, only receives state frames of the host machine and the standby machine, updates local backup data once every 100ms, and does not actively send data.

Preferably, the step 6 specifically includes:

The Wen Bei machine is automatically activated, the system mirror image loading and data synchronization are completed within 50ms, a new dual-machine hot standby system is built with the standby machine, the hot standby machine is switched into a host machine, the Wen Bei machine is used as the hot standby machine, the new host machine sends a fault switching message to the control bus, the fault time and the last operation parameter of the old host machine are recorded, and the daemon marks the fault type of the old host machine as a permanent hardware fault and writes the permanent hardware fault into a log.

Preferably, the step 7 specifically includes:

The Wen Bei machine is automatically activated, the system mirror image loading and data synchronization are completed within 50ms, a new double-machine hot standby system is formed by the system and the host machine, the Wen Bei machine is used as a hot standby machine, and the daemon marks the fault type of the hot standby machine as a permanent hardware fault and writes the permanent hardware fault into a log.

Preferably, the step 8 specifically includes:

the system is characterized in that a certain process of a single machine running in the system is deadlocked, a node level detection alarm mechanism cannot be triggered, a daemon process is continuously maintained for 2 times without receiving heartbeat response of the process, a fault analysis module is called, history data is combined to judge that the process level is permanently faulty, a process restarting mechanism is triggered, the single machine reliability of the faulty process is reduced by 0.5, reset and initialization of the process are completed within 100ms, and if the process is restarted, the single machine reliability is recovered.

A computer readable storage medium having stored thereon a computer program for execution by a processor for implementing a high reliability embedded processor redundancy method based on dynamic dual mode redundancy.

A computer device comprising a memory storing a computer program and a processor implementing a high reliability embedded processor redundancy method based on dynamic dual mode redundancy when executing the computer program.

The invention has the following beneficial effects:

The dynamic dual-mode redundancy architecture provided by the invention breaks the limitation that the backup mode is fixed in the traditional dual-machine hot-standby architecture, and a redundancy system capable of being flexibly adjusted is constructed by introducing Wen Bei processor units. The architecture can realize automatic identification and replacement of fault nodes in the running process of the system, and greatly improves the fault tolerance and reconstruction flexibility of the system under the multi-node fault scene. Meanwhile, compared with triple modular redundancy and an improved scheme thereof, the framework does not need to maintain continuous parallel operation of three nodes, and is more suitable for application scenes sensitive to cost and resources.

The daemon process independent of the main and standby systems is designed to construct a set of fine-granularity fault monitoring system. The process runs in an independent safe execution environment, is free from the involvement of the software faults of the main and standby systems, and can periodically send heartbeat signals to each software process of the main and standby systems and carefully analyze response information. Through establishing a fault symptom database and applying preset rules, the framework improves the fault positioning accuracy from the traditional node level to the process level, can accurately detect internal faults which are difficult to identify by traditional systems, such as software process blocking, resource leakage and the like, effectively eliminates fault detection blind areas, and improves the response speed of the system to software faults.

The dual-redundancy physical layer jumper structure designed by the invention avoids single-point fault risks of the traditional single-core jumper structure from a physical level. When one of the links fails, the system can be seamlessly switched to the other link to continue transmitting heartbeat signals, so that the host and the standby can accurately acquire the state information of the other party, and the problem of misjudgment of the host and the standby caused by the failure of the single link is thoroughly solved.

The invention assigns reliability weight to each single machine, and the weight is dynamically adjusted according to the real-time running state of the processor, thereby reflecting the health degree of the equipment. When the main machine and the standby machine synchronize data, the main machine and the standby machine share reliability weight information, and output permission is judged according to the weight. When the two machines are consistent, the default host outputs the data, and the equipment with high weight is selected to output the data if the weights are equal, so that the mechanism effectively avoids the occurrence of the problem of split brain from the logic level, and improves the service continuity of the system.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a dynamic dual-mode redundancy topology of the present invention;

FIG. 2 is a flow chart of the dynamic dual-mode redundant single machine transient fault tolerance of the present invention;

FIG. 3 is a flow chart of the dynamic dual-mode redundant single machine node level fault tolerance of the present invention;

FIG. 4 is a flow chart of the dynamic dual mode redundant output control of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The present invention will be described in detail with reference to specific examples.

First embodiment:

according to the specific optimization technical scheme adopted by the invention for solving the technical problems, as shown in the figures 1 to 4, the invention relates to a high-reliability embedded processor redundancy method based on dynamic dual-mode redundancy.

The invention provides a high-reliability embedded processor redundancy architecture based on dynamic dual-mode redundancy, which comprises a host, a hot standby machine and a warm standby machine, wherein priority is allocated to each single machine, and the priority is respectively the host, the hot standby machine and Wen Beiji from top to bottom;

Specific embodiment II:

The second embodiment of the present application differs from the first embodiment only in that:

The invention provides a high-reliability embedded processor redundancy method based on dynamic dual-mode redundancy, which is operated based on a high-reliability embedded processor redundancy architecture based on dynamic dual-mode redundancy, and comprises the following steps:

Step 8, software process fault processing;

Compared with the prior art, the invention has the technical effects and advantages that:

1. the dynamic dual-mode redundancy architecture provided by the invention breaks the limitation that the backup mode is fixed in the traditional dual-machine hot-standby architecture, and a redundancy system capable of being flexibly adjusted is constructed by introducing Wen Bei processor units. The architecture can realize automatic identification and replacement of fault nodes in the running process of the system, and greatly improves the fault tolerance and reconstruction flexibility of the system under the multi-node fault scene. Meanwhile, compared with triple modular redundancy and an improved scheme thereof, the framework does not need to maintain continuous parallel operation of three nodes, and is more suitable for application scenes sensitive to cost and resources.

2. The daemon process independent of the main and standby systems is designed to construct a set of fine-granularity fault monitoring system. The process runs in an independent safe execution environment, is free from the involvement of the software faults of the main and standby systems, and can periodically send heartbeat signals to each software process of the main and standby systems and carefully analyze response information. Through establishing a fault symptom database and applying preset rules, the framework improves the fault positioning accuracy from the traditional node level to the process level, can accurately detect internal faults which are difficult to identify by traditional systems, such as software process blocking, resource leakage and the like, effectively eliminates fault detection blind areas, and improves the response speed of the system to software faults.

3. The dual-redundancy physical layer jumper structure designed by the invention avoids single-point fault risks of the traditional single-core jumper structure from a physical level. When one of the links fails, the system can be seamlessly switched to the other link to continue transmitting heartbeat signals, so that the host and the standby can accurately acquire the state information of the other party, and the problem of misjudgment of the host and the standby caused by the failure of the single link is thoroughly solved.

4. The invention assigns reliability weight to each single machine, and the weight is dynamically adjusted according to the real-time running state of the processor, thereby reflecting the health degree of the equipment. When the main machine and the standby machine synchronize data, the main machine and the standby machine share reliability weight information, and output permission is judged according to the weight. When the two machines are consistent, the default host outputs the data, and the equipment with high weight is selected to output the data if the weights are equal, so that the mechanism effectively avoids the occurrence of the problem of split brain from the logic level, and improves the service continuity of the system.

Third embodiment:

The difference between the third embodiment and the second embodiment of the present application is that:

the step 1 specifically comprises the following steps:

Fourth embodiment:

The fourth embodiment of the present application differs from the third embodiment only in that:

The step 2 specifically comprises the following steps:

Fifth embodiment:

the fifth embodiment of the present invention differs from the fourth embodiment only in that:

the step 3 specifically comprises the following steps:

Specific embodiment six:

the difference between the sixth embodiment and the fifth embodiment of the present invention is that:

The step 6 specifically comprises the following steps:

Specific embodiment seven:

the seventh embodiment of the present invention differs from the sixth embodiment only in that:

the step 7 specifically comprises the following steps:

Specific embodiment eight:

the eighth embodiment of the present invention differs from the seventh embodiment only in that:

the step 8 specifically comprises the following steps:

Specific embodiment nine:

The difference between the embodiment nine and the embodiment eight of the present invention is that:

The present invention provides a computer readable storage medium having stored thereon a computer program for execution by a processor for implementing a high reliability embedded processor redundancy method based on dynamic dual mode redundancy.

The method comprises the following steps:

Step 8, software process fault processing;

Specific embodiment ten:

the tenth embodiment of the present invention differs from the ninth embodiment only in that:

The invention provides a computer device, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes a high-reliability embedded processor redundancy method based on dynamic dual-mode redundancy when executing the computer program.

The method comprises the following steps:

Step 8, software process fault processing;

Specific example eleven:

the eleventh embodiment of the present invention differs from the tenth embodiment only in that:

Aiming at the defects or improvement demands of the prior art, the invention provides a dynamic dual-mode redundancy method, which combines a daemon process to collect the running state of a processor and a dual-heartbeat chain redundancy technology, and aims to improve the reliability and stability of long-time uninterrupted running of a dual-machine hot standby system.

In order to achieve the above purpose, the present invention provides the following technical solutions:

The method comprises the following steps:

Step 8, software process fault processing;

In the first aspect, based on the traditional dual-machine hot-standby architecture, a cold-standby processor unit is innovatively introduced, and an elastic redundancy architecture of 'two main hot standby plus one temperature redundancy' is constructed. In the architecture, two hosts keep a hot standby running state, perform data synchronization and state interaction in real time, and Wen Bei processors are in a low-power standby mode, and only maintain basic state monitoring and host running data backup functions. When the two machines are inconsistent in the data processing or instruction output process, the Wen Bei node is connected with the system, a preset consistency check algorithm is called to carry out cross comparison on the three-party data, the data divergence is eliminated through a majority voting mechanism, the accuracy of an output result is ensured, when any one of the main machine and the standby machine has unrecoverable hardware faults or core software crashes, the Wen Bei node is automatically activated to quickly load key data, the key data and the surviving host machine are recombined into the two-machine hot standby system, and the whole switching process is completed in a millisecond level, so that the recovery time of the system is effectively shortened.

In a second aspect, the daemon process is designed independently of the primary and backup systems. The daemon process periodically sends heartbeat signals to each system software process of the main system, collects response information and establishes a fault symptom database, and analyzes the fault type and severity through preset rules. When detecting the unrecognizable software faults detected at the node level such as process seizure, resource leakage and the like, actively triggering the active-standby switching or process restarting, and realizing multi-level fault diagnosis from the node level to the process level.

In a third aspect, a dual redundancy physical layer jumper architecture is designed for single point failure hidden danger of a traditional single jumper architecture. The structure adopts two independent physical communication links, heartbeat signals are alternately transmitted on the two links, a receiving end performs timestamp comparison and data integrity check on the two paths of signals, and the accuracy of state information is ensured through a cross verification mechanism. When one link fails, the system can still use the other link to maintain communication, so that misjudgment of the state of the main machine and the standby machine caused by failure of a single link is effectively avoided, and the triggering condition of the split brain problem is eliminated from the physical layer level.

In the fourth aspect, daemon monitoring results are added into synchronous data of two parties to serve as local reliability weight, equipment with high health degree of the running state of the processor is high in weight, two machines share data mutually and are subjected to synchronous comparison, when the two machines are consistent in data, the main machine and the standby machine are output by default when the weights of the main machine and the standby machine are equal, the standby machine is not output, and a processor unit with high weight is selected to output when the weights are unequal. Effectively solves the problem of split brain and reduces the switching time of the main and standby.

The invention provides a high-reliability embedded processor redundancy method based on dynamic dual-mode redundancy, which comprises the following steps:

Step one, initializing a system, and deploying a double-main hot standby and one-temperature redundancy framework in an industrial automation control scene, wherein the framework comprises a host A, a hot standby B and a hot standby C, and each single machine is assigned with a priority, and the priority is respectively the host A, the hot standby B and the Wen Bei machine C from top to bottom. After the system is powered on, the host A and the standby B enter a hot standby running state, the Wen Bei machine C enters a low-power standby mode, and only the basic state monitoring module and the data receiving module are started.

The daemon process is independently started in the host A and the standby B respectively, a fault symptom database is initialized, and the heartbeat signal sending period is set to be 50ms. The dual redundant heartbeat link completes self-checking and establishes connection, and the master and slave machines synchronize initial system configuration parameters and reliability weight initial values (both are set to 0.5).

And step two, operating normally, wherein the host machine A and the standby machine B execute an industrial control algorithm in real time, and perform data synchronization once every 10ms, wherein the synchronization content comprises data acquisition by a sensor, an operation intermediate result and a control instruction. The daemon process sends heartbeat signals to the software processes in the host A and the standby B according to the period, and the indexes such as process response time, memory occupancy rate and the like are collected to update the fault symptom database.

Step three, state detection, namely, the dual-redundancy heartbeat link alternately transmits heartbeat signals according to a set heartbeat signal transmission period, the interval time of the two paths of heartbeat signals is 10ms, the heartbeat signal content comprises a local running state code, a reliability weight value and a check code, the receiving end sets the heartbeat signal timeout time as three heartbeat periods, if the heartbeat signal timeout time exceeds the heartbeat signal timeout time, the node level fault tolerance flow of the dynamic dual-mode redundancy single machine is triggered, the Wen Bei machine is awakened, and the dual-machine hot standby redundant system is reconstructed with the surviving single machine.

If one heartbeat signal is received within 60ms, the heartbeat signal communication link between the two machines is not damaged by external interference, the opposite single machine node survives, and the receiving end decodes and checks the heartbeat signal and updates the opposite state table.

And step four, three machines of data synchronization, wherein the redundant system needs to share data at fixed time in operation, the host machine A and the hot standby machine B need to mutually send data, and the Wen Bei machines receive data so as to be capable of quickly synchronizing with the operation states of other single machines in wake-up. The host A sends a control command frame and a sensor data frame once every 10ms, the frame ID contains a host identifier (0 x 01), and the hot standby machines B and Wen Beiji C monitor and analyze in real time.

After the hot standby machine B receives the data, the local operation result is compared with the synchronous data, if the local operation result is consistent, a confirmation frame (containing self state weight) is sent every 20ms, and if the local operation result is inconsistent, a difference frame (marked inconsistent field) with a priority mark is sent immediately.

Wen Bei machine C is in low power consumption monitoring mode, only receives the state frames (once every 100 ms) of host A and standby B, updates the local backup data, and does not actively send data.

And fifthly, synchronizing data under faults, when the host A and the standby machine B are switched to the host, immediately sending a broadcast frame (ID=0x02, highest priority) with a main node switching mark, enabling the Wen Bei machine C to activate a synchronization mode after receiving the broadcast frame, and interacting data with the new host B every 10 ms.

When the double-machine data inconsistency triggers Wen Bei machine C to participate in arbitration, wen Bei machine C sends an independent operation result frame (ID=0x03), and the main machine and the standby machine execute majority voting according to the three-machine data, and the voting result is synchronized to the three machines through a broadcast frame.

Step six, the inconsistent processing of the double-machine data, the data of the main machine and the standby machine are synchronized and compared by using the flow of the step four, and when the output of the main machine A is inconsistent with the output data of the hot standby machine B, the fault of one single machine can be known, but the fault position cannot be accurately positioned. At this time, the system triggers a data consistency check mechanism that Wen Bei machines are awakened, the same external input data are loaded and operated, the output result and the results of the host machine A and the hot standby machine B are subjected to three-party voting comparison (2:1), if the data of the machine A are judged to be valid, the machine A outputs the data of the machine A, and meanwhile, the daemon records the operation deviation of the standby machine B and adjusts the reliability weight of the operation deviation from 0.5 to 0.45.

And step seven, host fault processing, wherein the host A triggers hardware watchdog reset due to CPU core damage, and the heat engine B judges that a host node fails after the effective heartbeat signal of the host A is not received in continuous 3 heartbeat cycles, and triggers a dynamic dual-mode redundant single-machine node level fault tolerance flow shown in figure 3. Wen Bei machine C is automatically activated, system mirror image loading and data synchronization are completed within 50ms, a new double-machine hot standby system is formed by the system mirror image loading and data synchronization and the standby machine B (the hot standby machine B is switched to a host machine, wen Bei machine C is used as a hot standby machine), the new host machine B sends a fault switching message to a control bus, fault time and the last operation parameters of the host machine A are recorded, and a daemon marks the fault type of the host machine A as a permanent hardware fault and writes the permanent hardware fault into a log.

And step eight, performing fault treatment on the hot standby machine, wherein the hot standby machine B triggers hardware watchdog reset due to the damage of a CPU core, and after the host A does not receive an effective heartbeat signal of the hot standby machine B in3 continuous heartbeat cycles, judging that a node of the hot standby machine fails, and triggering a dynamic dual-mode redundant single-machine node level fault tolerance flow shown in figure 3. Wen Bei machine C is automatically activated, system mirror loading and data synchronization are completed within 50ms, a new dual-machine hot standby system is formed by the system mirror loading and data synchronization and the host A (Wen Bei machine C is used as a hot standby machine), and the daemon marks the fault type of the hot standby machine B as a permanent hardware fault and writes the permanent hardware fault into a log.

Step nine, processing faults of software processes, namely, deadlock occurs in a certain process of a single machine running in the system, a node level detection alarm mechanism cannot be triggered, a daemon process is continuously protected for 2 times, heartbeat response of the process is not received, a fault analysis module is called, history data is combined to judge that the process level is permanently faulty, a process restarting mechanism is triggered, the reliability of the single machine with the faulty process is reduced by 0.5, reset and initialization of the process are completed within 100ms, and if the process is restarted, the single machine reliability is restored if the process is restored to normal running.

And step ten, the redundant system outputs and controls, and the host machine A and the hot standby machine B in the dynamic dual-mode redundant system simultaneously receive data transmitted from the outside of the system, process the data and simultaneously transmit the data to the outside. In order to prevent the data sent by the two processor units from interfering with each other, the data of the authority machine is to be output to the outside, the output of the authority machine is shielded, and a software shielding mode is adopted for the output shielding mode.

When the single machine operates to the output node, whether the single machine with higher reliability than the single machine operates normally is judged first, if yes, the output of the single machine is shielded, if not, whether the single machine with higher output priority than the single machine operates normally is judged continuously, if yes, the output of the single machine is shielded, and if not, the output right is obtained. The redundant system output control flow is shown in fig. 4.

The above description is only a preferred implementation manner of the high-reliability embedded processor redundancy method based on dynamic dual-mode redundancy, and the protection scope of the high-reliability embedded processor redundancy method based on dynamic dual-mode redundancy is not limited to the above embodiments, and all technical solutions under the concept belong to the protection scope of the present invention. It should be noted that modifications and variations can be made by those skilled in the art without departing from the principles of the present invention, which is also considered to be within the scope of the present invention.

Claims

1. The high-reliability embedded processor redundancy architecture based on dynamic dual-mode redundancy is characterized by comprising a host, a hot standby machine and a warm standby machine, wherein each single machine is assigned with a priority, and the priority is respectively the host, the hot standby machine and Wen Beiji from top to bottom;

2. A high reliability embedded processor redundancy method based on dynamic dual mode redundancy, the method operating based on the architecture of claim 1, the method comprising the steps of:

Step 8, software process fault processing;

3. The method according to claim 2, wherein the step 1 is specifically:

4. The method according to claim 3, wherein the step 2 is specifically:

5. The method according to claim 4, wherein the step 3 is specifically:

6. The method according to claim 5, wherein the step 6 is specifically:

7. The method according to claim 6, wherein the step 7 is specifically:

8. The method according to claim 7, wherein the step 8 is specifically:

9. A computer readable storage medium having stored thereon a computer program, characterized in that the program is executed by a processor for implementing the method according to claims 2-8.

10. A computer device comprising a memory and a processor, said memory storing a computer program, characterized in that the processor implements the method of claim 2-8 when executing said computer program.