JPH0391841A

JPH0391841A - Abnormality recovery processing system

Info

Publication number: JPH0391841A
Application number: JP1228293A
Authority: JP
Inventors: Hiroshi Egawa; 江川　浩; Hitoshi Toyama; 遠山　均
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1989-09-05
Filing date: 1989-09-05
Publication date: 1991-04-17
Anticipated expiration: 2014-01-06
Also published as: JP2844361B2

Abstract

PURPOSE:To attain a process based on an experimental fact and to recover the abnormality with high reliability by transmitting a reset signal after detection of a fact that all information devices are abnormal. CONSTITUTION:When each of information processors 1(1) - 1(n) has the abnormal ity, each of halt instruction generating parts 1(1)a - 1(n)a provided for the infor mation processors respectively produces a halt instruction to each information processor. When the halt instructions are produced from all halt instruction generating parts, these instructions are detected by a full halt state detecting part 2. Then the reset signals are outputted from a reset signal transmission part 3 in accordance with the detection output of the part 2. Thus the processors 1(1) - 1(n) are reset. As a result, it is possible to attain an abnormality recovery processing system that is started again and can perform a normal process with high reliability despite the abnormal states of all information processors based on an experiential fact that the abnormality of a computer is caused most by a temporary fault.

Description

【発明の詳細な説明】〔概要〕複数の処理装置を有するシステムに属する当該処理装置
に発生した異常に対し、正常な処理の続行を図る異常回
復処理方式に関し、全処理装置が異常状態に陥った場合にもシステムを再度
立ち上げて正常な処理を続行することができる信頼性の
ある異常回復処理方式を提供することを目的とし、前記処理装置に発生した異常により休止状態とする休止
命令発生部を当該各処理装置に設けるとともに、全処理
装置が休止状態にあるか否かを判別する全休止状態検出
部と、全処理装置が休止状態にあると判断した場合に前
記全処理装置に対してリセット信号を送出するリセット
信号退出部とを有する構或である。[Detailed Description of the Invention] [Summary] Regarding an abnormality recovery processing method that attempts to continue normal processing when an abnormality occurs in a processing device belonging to a system having a plurality of processing devices, all processing devices fall into an abnormal state. The purpose of this invention is to provide a reliable abnormality recovery processing method that allows the system to restart and continue normal processing even in the event of an abnormality that occurs in the processing device. In addition, an all-inactive state detection part is provided in each of the processing devices, and an all-inactive state detection unit for determining whether or not all the processing devices are in an inactive state; and a reset signal exit section that sends out a reset signal.

[Industrial application field]

本発明は異常回復処理方式に係り、特に、複数の処理装
置を有するシステムに属する当該処理装置に発生した異
常に対し、正常な処理の続行を図る異常回復処理方式に
関する。The present invention relates to an abnormality recovery processing method, and particularly to an abnormality recovery processing method that attempts to continue normal processing when an abnormality occurs in a processing device belonging to a system having a plurality of processing devices.

（従来の技術）従来、第６図に示すようなシステムがあった。(Conventional technology) Conventionally, there has been a system as shown in FIG.

本システムは複数の処理装置６　（ｘ）．ｉ＝１，ｚ，
〜（コンピュータ）からなり、各処理装置は相互に異常
を監視しており、ある処理装置に異常が発生した場合に
は、残りの処理装置によりシステムの運用を続行するよ
うな異常回復方式が用いられていた。This system includes a plurality of processing devices 6 (x). i=1,z,
The system consists of ~ (computers), and each processing device monitors each other for abnormalities. If an abnormality occurs in one processing device, an abnormality recovery method is used that allows the remaining processing devices to continue operating the system. It was getting worse.

（発明が解決しようとする課題）ところで、従来の異常回復方式にあっては、各処理装置
に次々と異常が発生し、遂に残りの処理装置がなくなっ
てしまった場合には、当該システムは処理を続行するこ
とができないという問題点を宥．していた。(Problem to be Solved by the Invention) By the way, in the conventional abnormality recovery method, when an abnormality occurs in each processing device one after another and the remaining processing devices are finally exhausted, the system is unable to perform the processing. This solves the problem of not being able to continue. Was.

一方、一般にコンピュータの異常は一時的な障害が殆ど
であり再度立ち上げることによりコンピュータは正常な
処理を続行するという経験的事実があった。On the other hand, there is an empirical fact that, in general, most computer abnormalities are temporary failures, and that by restarting the computer, the computer continues normal processing.

そこで、本発明は全処理装置が異常状態に陥った場合に
もシステムを再度立ち上げて正常な処理を続行すること
ができる信頼性のある異常回復処理方式を提供すること
を目的としてなされたものである。Therefore, the present invention has been made with the object of providing a reliable abnormality recovery processing method that can restart the system and continue normal processing even if all processing devices fall into an abnormal state. It is.

[Means to solve the problem]

以上の技術的課題を解決するため木発明は第１図に示す
ように、複数の処理装置１（ｉ），ｉ＝１．２〜ｎを有
するシステムに属する当該処理装置１（ｉ）に発生した
異常に対し、正常な処理の続行を図る異常処理方式にお
いて、前記処理装置１（ｉ）に発生した異常により休止
状態とする休止命令発生部１（ｉ）ａを当該各処理装置
に設けるとともに、全処理装置１（ｉ）が休止状態にあ
るか否かを判別する全休止状態検出部２と、全処理装置
が休止状態にあると判断した場合に前記全処理装置に対
してリセット信号を送出するリセット信号送出部３とを
有するものである。In order to solve the above technical problems, as shown in FIG. In an abnormality handling method that attempts to continue normal processing in response to an abnormality that occurs, each processing device is provided with a suspension command generation unit 1(i)a that causes the processing device 1(i) to enter a suspended state due to an abnormality that occurs. , an all-dormant state detection unit 2 that determines whether or not all the processing devices 1(i) are in a dormant state, and a reset signal to all the processing devices when it is determined that all the processing devices are in a dormant state. It has a reset signal sending unit 3 that sends out a reset signal.

[Effect]

各処理装置１　（　ｉ）　＋　ｔ−１　ｓ　２〜ｎには
休止命令発生部１（ｉ）ａ；ｉ・１，２〜ｎが設けられ
ており、各処理装置で異常が発生した場合には当該処理
装置は休止状態となる。Each processing device 1 (i) + t-1 s 2 to n is provided with a suspend instruction generating unit 1 (i) a; In this case, the processing device becomes inactive.

異常が発生しても、正常な動作をしている処理装置が１
つでもある限りは、当該処理装置によりシステムは正常
な処理が続行される。Even if an abnormality occurs, only one processing device is operating normally.
As long as there is one, the system continues normal processing by the processing device.

しかし、当該全処理装置に異常が発生して前記休止命令
発生部により休止状態となったことが前記全休止状態検
出部２により検出された場合には、当該システムは正常
に処理を続行することができないことになり、その旨を
前記リセット信号送出部３に通知する。However, if the all-dormant state detection unit 2 detects that an abnormality has occurred in all the processing devices and the suspension command generating unit has put them into a dormant state, the system can continue processing normally. The reset signal sending unit 3 is notified of this fact.

すると、通知を受けたリセット信号送出部３は全処理装
置に対してリセット信号を送出することになる。Then, the reset signal sending unit 3 that has received the notification sends a reset signal to all processing devices.

これは、一般にコンピュータの異常は一時的な障害が殆
どであり、再度立ち上げることにより正常な処理を続行
することがあるという経験的事実に基づくものである。This is based on the empirical fact that most computer abnormalities are generally temporary failures, and normal processing may continue by restarting the computer.

〔Example〕

続いて、本発明の実施例について説明する。 Next, examples of the present invention will be described.

第２図に本実施例に係るシステムの全体図を示す。FIG. 2 shows an overall diagram of the system according to this embodiment.

本装置は複数のＣＰＵ等の処理装置１１（ｉ）；ｉ＝１
．２，〜を有するとともに、当該処理装置１１（ｉ）に
発生した異常により当該処理装置を休止状態とする前記
休止命令発生部１（ｉ）ａに相当するホルト命令発生部
１１（ｉ）ａが当該各処理装置に設けられている。This device has a plurality of processing devices 11(i) such as CPUs; i=1
．． A halt instruction generating section 11(i)a corresponding to the suspending instruction generating section 1(i)a which puts the processing device into a hibernation state due to an abnormality occurring in the processing device 11(i); It is provided in each processing device.

さらに、本システムでは当該各処理装置とは独立したサ
ブシステムであって、各処理装置を運用していくのに必
要な本体のハードウェア制御、オペレーティング・シス
テム（ＯＳ）との会話手段の提供、システムの運用状況
の監視、及び診断等を行うものである。Furthermore, this system is a subsystem that is independent of each processing device, and provides hardware control of the main body necessary for operating each processing device, a means of conversation with the operating system (OS), It monitors and diagnoses the operational status of the system.

第３図に当該監視装置を詳細に示すものである。FIG. 3 shows the monitoring device in detail.

当該監視装置はＣＰＵにより構戊され、前記全ホルト状
態検出部２及びリセット信号送出部３に相当する監視処
理部２１と、各処理装置の制御用レジスタに対して書込
みまたは読出しのアクセスを行うハード・アクセス部２
２と、前記各処理装置１１（ｉ）；ｉ・１，２，３，〜
がホルト状態（休止状態）にあるか否かを示す各処理装
置毎に対応する表を格納しておくメモリからなる制御テ
ーブル２３と、外部記憶装置との入出力操作の処理や実
行順序、誤り処理等のシステムを円滑に動かすために用
いる管理部２４とを有するものである。The monitoring device is composed of a CPU, and includes a monitoring processing section 21 corresponding to the all-halt state detection section 2 and reset signal sending section 3, and hardware that performs writing or reading access to control registers of each processing device.・Access part 2
2, and each of the processing devices 11(i); i・1, 2, 3, ~
A control table 23 consisting of a memory stores a table corresponding to each processing device indicating whether or not it is in a halt state (dormant state), processing of input/output operations with external storage devices, execution order, errors, etc. It has a management section 24 used for smooth operation of the system such as processing.

本実施例に係る装置は次のように動作する。The device according to this embodiment operates as follows.

各処理装置１　１　（ｉ）；ｉ＝１．２，〜に異常が発
生して当該処理装置が正常な処理を続行することができ
ないと判断した場合には、前記ホルト命令発生部１　１
　（ｉ）；ｉ＝１．２，〜は当該処理装置をホルト状態
にする。各処理装置の運転の状況、すなわち、ホルト状
態にあるか否かは各処理装置１１（ｉ）；ｉ＝１．２，
〜のステータス・レジスタに表示される。If an abnormality occurs in each processing device 1 1 (i); i=1.2, and it is determined that the processing device cannot continue normal processing, the halt instruction generating unit 1 1
(i); i=1.2, ~ puts the processing device in the halt state. The operating status of each processing device, that is, whether it is in a halt state or not, is determined by each processing device 11(i); i=1.2.
displayed in the status register of ~.

当該監視装置２０は第５図の流れ図に示すように、ステ
ップＳＪＩでそのハード・アクセス部２２は前記各処理
装置１　１　（１）　；ｘ＝１．２．〜のステータス・
レジスタに対して定期的に読出し（アクセス）の指示を
行う。読み出された情報、すなわちホルト状態にあるか
否かの情報は前記制御テーブル２３に第４図上段に示す
ように各処理装置毎に記録される。As shown in the flowchart of FIG. 5, the monitoring device 20, in step SJI, accesses each of the processing devices 1 1 (1); x=1.2. status of ~
Periodically instructs to read (access) the register. The read information, that is, information as to whether or not it is in the halt state, is recorded in the control table 23 for each processing device as shown in the upper part of FIG. 4.

ステップＳＪ２で前記監視処理部２１は定期的に前記制
御テーブル２３に対してアクセスを行い、当該表の内容
を監視し、全処理装置がホルト状態であるか否かの判定
を行う。In step SJ2, the monitoring processing unit 21 periodically accesses the control table 23, monitors the contents of the table, and determines whether all processing devices are in the halt state.

全処理装置１１（ｘ）；ｉ＝１．２，〜がホルト状態で
ない場合には、ホルト状態にない残余の処理装置（たと
え、それが１個しかない場合であっても）によりシステ
ムの処理の続行が図られ、何ら監視装置からは指示はな
い。If all processors 11(x); i=1.2, ~ are not in the halt state, the remaining processors (even if there is only one) that are not in the halt state process the system. The operation continued, and there were no instructions from the monitoring equipment.

一方、第４図上段に示すように、前記制御テーブル２３
内の表に全処理装置がホルト状態にあることを前記監視
処理部２１が認識した場合には、ステップＳＪ３に進み
、前記ハード・アクセス部２２により、各処理装置のリ
セット用レジスタに対して、第４図中段に示すような指
示を与え、当該ハード・アクセス部２２は各処理装置に
対してリセット指示信号の書込みを行うことになる。On the other hand, as shown in the upper part of FIG.
If the monitoring processing section 21 recognizes that all the processing devices are in the halt state in the table, the process proceeds to step SJ3, and the hard access section 22 performs the following operations on the reset register of each processing device. An instruction as shown in the middle part of FIG. 4 is given, and the hard access section 22 writes a reset instruction signal to each processing device.

その結果、第４図下段に示すように、各処理装置は立ち
上げられ、処理が続行され、前記制御テーブル２３の表
は各処理装置のステータス・レジスタを反映してホルト
状態にはない旨の表示がなされていることになる。As a result, as shown in the lower part of FIG. 4, each processing device is started up and processing continues, and the table of the control table 23 reflects the status register of each processing device, indicating that it is not in a halt state. It will be displayed.

〔Effect of the invention〕

以上説明したように、本発明では複数の各処理装置に異
常が発生して正常な処理を続行することができない場合
には、当該処理装置を休止状態とし、当該処理装置が全
部休止状態にあることを検出した場合には、当該全処理
装置に対してリセット信号を送出して立ち上げるように
している。As explained above, in the present invention, when an abnormality occurs in each of a plurality of processing devices and normal processing cannot be continued, the processing device is placed in a hibernation state, and all of the processing devices are in a hibernation state. When this is detected, a reset signal is sent to all the processing devices to start them up.

したがって、全装置が休止状態となって、システムの運
用が全くされない事態を防止し、信頼性のあるシステム
運用を図ることができることになる。Therefore, it is possible to prevent a situation in which all devices are in a dormant state and the system is not operated at all, and to ensure reliable system operation.

[Brief explanation of drawings]

第１図は本発明の原理ブロック図、第２図は実施例に係
る全体ブロック図、第３図は実施例に係る監視装置を示
す図、第４図は実施例に係る監視装置の機能説明図、第
５図は実施例に係る監視装置の処理流れ図及び第６図は
従来例に係るブロック図である。１　（ｉ）　，　１　１　（ｉ）；ｉ＝１．２，〜ｎ・
・・処理装置１（ｉ）　ａ　（１　１（ｉ）　ａ）　；
ｉ＝１．２，〜ｎ・・・休止命令発生部（ホルト命令発
生部）２　（２０）・・・全休止状態検出部（監視装置）３　
（２０）・・・リセット信号送出部（監視装置）発４月
の原理ブロック圀ｉｌｌ　　図実坊邑視に４沿る仝ル鯨ブロ，７／７図実兇府・ｊ斥イ
糸ろ顎硯架實吃示吋圀ｇｓ　３　因実施４＃Ｊ　１″：ａる監視装置の丸理流れ例第　５ｗ
ＪFig. 1 is a principle block diagram of the present invention, Fig. 2 is an overall block diagram according to an embodiment, Fig. 3 is a diagram showing a monitoring device according to an embodiment, and Fig. 4 is a functional explanation of a monitoring device according to an embodiment. 5 is a processing flowchart of a monitoring device according to an embodiment, and FIG. 6 is a block diagram of a conventional example. 1 (i), 1 1 (i); i=1.2, ~n・
...processing device 1(i) a (1 1(i) a);
i=1.2, ~n... Pause command generation section (halt command generation section) 2 (20)... Total pause state detection section (monitoring device) 3
(20)... Reset signal sending unit (monitoring device) emitted from April's principle block area ill Inkstone casing actual illustration 3 gs 3 cause implementation 4#J 1'': Example of the flow of a monitoring device No. 5w
J

Claims

[Claims] Continuation of normal processing in response to an abnormality occurring in the processing device {1(i)} belonging to a system having a plurality of processing devices {1(i), i=1, 2 to n} In an abnormality processing method that aims at (i) an all-dormant-state detection unit (2) that determines whether or not } is in a dormant state; and an all-dormant state detection unit (2) that sends a reset signal to all the processing devices when it is determined that all the processing devices are in a dormant state; An abnormality recovery processing method characterized by comprising a reset signal sending section (3).