JPH0391841A - Abnormality recovery processing system - Google Patents

Abnormality recovery processing system

Info

Publication number
JPH0391841A
JPH0391841A JP1228293A JP22829389A JPH0391841A JP H0391841 A JPH0391841 A JP H0391841A JP 1228293 A JP1228293 A JP 1228293A JP 22829389 A JP22829389 A JP 22829389A JP H0391841 A JPH0391841 A JP H0391841A
Authority
JP
Japan
Prior art keywords
processing
abnormality
halt
reset signal
processors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP1228293A
Other languages
Japanese (ja)
Other versions
JP2844361B2 (en
Inventor
Hiroshi Egawa
江川 浩
Hitoshi Toyama
遠山 均
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to JP1228293A priority Critical patent/JP2844361B2/en
Publication of JPH0391841A publication Critical patent/JPH0391841A/en
Application granted granted Critical
Publication of JP2844361B2 publication Critical patent/JP2844361B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Landscapes

  • Retry When Errors Occur (AREA)
  • Hardware Redundancy (AREA)

Abstract

PURPOSE:To attain a process based on an experimental fact and to recover the abnormality with high reliability by transmitting a reset signal after detection of a fact that all information devices are abnormal. CONSTITUTION:When each of information processors 1(1) - 1(n) has the abnormal ity, each of halt instruction generating parts 1(1)a - 1(n)a provided for the infor mation processors respectively produces a halt instruction to each information processor. When the halt instructions are produced from all halt instruction generating parts, these instructions are detected by a full halt state detecting part 2. Then the reset signals are outputted from a reset signal transmission part 3 in accordance with the detection output of the part 2. Thus the processors 1(1) - 1(n) are reset. As a result, it is possible to attain an abnormality recovery processing system that is started again and can perform a normal process with high reliability despite the abnormal states of all information processors based on an experiential fact that the abnormality of a computer is caused most by a temporary fault.

Description

【発明の詳細な説明】 〔概要〕 複数の処理装置を有するシステムに属する当該処理装置
に発生した異常に対し、正常な処理の続行を図る異常回
復処理方式に関し、 全処理装置が異常状態に陥った場合にもシステムを再度
立ち上げて正常な処理を続行することができる信頼性の
ある異常回復処理方式を提供することを目的とし、 前記処理装置に発生した異常により休止状態とする休止
命令発生部を当該各処理装置に設けるとともに、全処理
装置が休止状態にあるか否かを判別する全休止状態検出
部と、全処理装置が休止状態にあると判断した場合に前
記全処理装置に対してリセット信号を送出するリセット
信号退出部とを有する構或である。
[Detailed Description of the Invention] [Summary] Regarding an abnormality recovery processing method that attempts to continue normal processing when an abnormality occurs in a processing device belonging to a system having a plurality of processing devices, all processing devices fall into an abnormal state. The purpose of this invention is to provide a reliable abnormality recovery processing method that allows the system to restart and continue normal processing even in the event of an abnormality that occurs in the processing device. In addition, an all-inactive state detection part is provided in each of the processing devices, and an all-inactive state detection unit for determining whether or not all the processing devices are in an inactive state; and a reset signal exit section that sends out a reset signal.

〔産業上の利用分野〕[Industrial application field]

本発明は異常回復処理方式に係り、特に、複数の処理装
置を有するシステムに属する当該処理装置に発生した異
常に対し、正常な処理の続行を図る異常回復処理方式に
関する。
The present invention relates to an abnormality recovery processing method, and particularly to an abnormality recovery processing method that attempts to continue normal processing when an abnormality occurs in a processing device belonging to a system having a plurality of processing devices.

(従来の技術) 従来、第6図に示すようなシステムがあった。(Conventional technology) Conventionally, there has been a system as shown in FIG.

本システムは複数の処理装置6 (x).i=1,z,
〜(コンピュータ)からなり、各処理装置は相互に異常
を監視しており、ある処理装置に異常が発生した場合に
は、残りの処理装置によりシステムの運用を続行するよ
うな異常回復方式が用いられていた。
This system includes a plurality of processing devices 6 (x). i=1,z,
The system consists of ~ (computers), and each processing device monitors each other for abnormalities. If an abnormality occurs in one processing device, an abnormality recovery method is used that allows the remaining processing devices to continue operating the system. It was getting worse.

(発明が解決しようとする課題) ところで、従来の異常回復方式にあっては、各処理装置
に次々と異常が発生し、遂に残りの処理装置がなくなっ
てしまった場合には、当該システムは処理を続行するこ
とができないという問題点を宥.していた。
(Problem to be Solved by the Invention) By the way, in the conventional abnormality recovery method, when an abnormality occurs in each processing device one after another and the remaining processing devices are finally exhausted, the system is unable to perform the processing. This solves the problem of not being able to continue. Was.

一方、一般にコンピュータの異常は一時的な障害が殆ど
であり再度立ち上げることによりコンピュータは正常な
処理を続行するという経験的事実があった。
On the other hand, there is an empirical fact that, in general, most computer abnormalities are temporary failures, and that by restarting the computer, the computer continues normal processing.

そこで、本発明は全処理装置が異常状態に陥った場合に
もシステムを再度立ち上げて正常な処理を続行すること
ができる信頼性のある異常回復処理方式を提供すること
を目的としてなされたものである。
Therefore, the present invention has been made with the object of providing a reliable abnormality recovery processing method that can restart the system and continue normal processing even if all processing devices fall into an abnormal state. It is.

〔課題を解決するための手段〕[Means to solve the problem]

以上の技術的課題を解決するため木発明は第1図に示す
ように、複数の処理装置1(i),i=1.2〜nを有
するシステムに属する当該処理装置1(i)に発生した
異常に対し、正常な処理の続行を図る異常処理方式にお
いて、前記処理装置1(i)に発生した異常により休止
状態とする休止命令発生部1(i)aを当該各処理装置
に設けるとともに、全処理装置1(i)が休止状態にあ
るか否かを判別する全休止状態検出部2と、全処理装置
が休止状態にあると判断した場合に前記全処理装置に対
してリセット信号を送出するリセット信号送出部3とを
有するものである。
In order to solve the above technical problems, as shown in FIG. In an abnormality handling method that attempts to continue normal processing in response to an abnormality that occurs, each processing device is provided with a suspension command generation unit 1(i)a that causes the processing device 1(i) to enter a suspended state due to an abnormality that occurs. , an all-dormant state detection unit 2 that determines whether or not all the processing devices 1(i) are in a dormant state, and a reset signal to all the processing devices when it is determined that all the processing devices are in a dormant state. It has a reset signal sending unit 3 that sends out a reset signal.

〔作用〕[Effect]

各処理装置1 ( i) + t−1 s 2〜nには
休止命令発生部1(i)a;i・1,2〜nが設けられ
ており、各処理装置で異常が発生した場合には当該処理
装置は休止状態となる。
Each processing device 1 (i) + t-1 s 2 to n is provided with a suspend instruction generating unit 1 (i) a; In this case, the processing device becomes inactive.

異常が発生しても、正常な動作をしている処理装置が1
つでもある限りは、当該処理装置によりシステムは正常
な処理が続行される。
Even if an abnormality occurs, only one processing device is operating normally.
As long as there is one, the system continues normal processing by the processing device.

しかし、当該全処理装置に異常が発生して前記休止命令
発生部により休止状態となったことが前記全休止状態検
出部2により検出された場合には、当該システムは正常
に処理を続行することができないことになり、その旨を
前記リセット信号送出部3に通知する。
However, if the all-dormant state detection unit 2 detects that an abnormality has occurred in all the processing devices and the suspension command generating unit has put them into a dormant state, the system can continue processing normally. The reset signal sending unit 3 is notified of this fact.

すると、通知を受けたリセット信号送出部3は全処理装
置に対してリセット信号を送出することになる。
Then, the reset signal sending unit 3 that has received the notification sends a reset signal to all processing devices.

これは、一般にコンピュータの異常は一時的な障害が殆
どであり、再度立ち上げることにより正常な処理を続行
することがあるという経験的事実に基づくものである。
This is based on the empirical fact that most computer abnormalities are generally temporary failures, and normal processing may continue by restarting the computer.

〔実施例〕〔Example〕

続いて、本発明の実施例について説明する。 Next, examples of the present invention will be described.

第2図に本実施例に係るシステムの全体図を示す。FIG. 2 shows an overall diagram of the system according to this embodiment.

本装置は複数のCPU等の処理装置11(i);i=1
.2,〜を有するとともに、当該処理装置11(i)に
発生した異常により当該処理装置を休止状態とする前記
休止命令発生部1(i)aに相当するホルト命令発生部
11(i)aが当該各処理装置に設けられている。
This device has a plurality of processing devices 11(i) such as CPUs; i=1
.. A halt instruction generating section 11(i)a corresponding to the suspending instruction generating section 1(i)a which puts the processing device into a hibernation state due to an abnormality occurring in the processing device 11(i); It is provided in each processing device.

さらに、本システムでは当該各処理装置とは独立したサ
ブシステムであって、各処理装置を運用していくのに必
要な本体のハードウェア制御、オペレーティング・シス
テム(OS)との会話手段の提供、システムの運用状況
の監視、及び診断等を行うものである。
Furthermore, this system is a subsystem that is independent of each processing device, and provides hardware control of the main body necessary for operating each processing device, a means of conversation with the operating system (OS), It monitors and diagnoses the operational status of the system.

第3図に当該監視装置を詳細に示すものである。FIG. 3 shows the monitoring device in detail.

当該監視装置はCPUにより構戊され、前記全ホルト状
態検出部2及びリセット信号送出部3に相当する監視処
理部21と、各処理装置の制御用レジスタに対して書込
みまたは読出しのアクセスを行うハード・アクセス部2
2と、前記各処理装置11(i);i・1,2,3,〜
がホルト状態(休止状態)にあるか否かを示す各処理装
置毎に対応する表を格納しておくメモリからなる制御テ
ーブル23と、外部記憶装置との入出力操作の処理や実
行順序、誤り処理等のシステムを円滑に動かすために用
いる管理部24とを有するものである。
The monitoring device is composed of a CPU, and includes a monitoring processing section 21 corresponding to the all-halt state detection section 2 and reset signal sending section 3, and hardware that performs writing or reading access to control registers of each processing device.・Access part 2
2, and each of the processing devices 11(i); i・1, 2, 3, ~
A control table 23 consisting of a memory stores a table corresponding to each processing device indicating whether or not it is in a halt state (dormant state), processing of input/output operations with external storage devices, execution order, errors, etc. It has a management section 24 used for smooth operation of the system such as processing.

本実施例に係る装置は次のように動作する。The device according to this embodiment operates as follows.

各処理装置1 1 (i);i=1.2,〜に異常が発
生して当該処理装置が正常な処理を続行することができ
ないと判断した場合には、前記ホルト命令発生部1 1
 (i);i=1.2,〜は当該処理装置をホルト状態
にする。各処理装置の運転の状況、すなわち、ホルト状
態にあるか否かは各処理装置11(i);i=1.2,
〜のステータス・レジスタに表示される。
If an abnormality occurs in each processing device 1 1 (i); i=1.2, and it is determined that the processing device cannot continue normal processing, the halt instruction generating unit 1 1
(i); i=1.2, ~ puts the processing device in the halt state. The operating status of each processing device, that is, whether it is in a halt state or not, is determined by each processing device 11(i); i=1.2.
displayed in the status register of ~.

当該監視装置20は第5図の流れ図に示すように、ステ
ップSJIでそのハード・アクセス部22は前記各処理
装置1 1 (1) ;x=1.2.〜のステータス・
レジスタに対して定期的に読出し(アクセス)の指示を
行う。読み出された情報、すなわちホルト状態にあるか
否かの情報は前記制御テーブル23に第4図上段に示す
ように各処理装置毎に記録される。
As shown in the flowchart of FIG. 5, the monitoring device 20, in step SJI, accesses each of the processing devices 1 1 (1); x=1.2. status of ~
Periodically instructs to read (access) the register. The read information, that is, information as to whether or not it is in the halt state, is recorded in the control table 23 for each processing device as shown in the upper part of FIG. 4.

ステップSJ2で前記監視処理部21は定期的に前記制
御テーブル23に対してアクセスを行い、当該表の内容
を監視し、全処理装置がホルト状態であるか否かの判定
を行う。
In step SJ2, the monitoring processing unit 21 periodically accesses the control table 23, monitors the contents of the table, and determines whether all processing devices are in the halt state.

全処理装置11(x);i=1.2,〜がホルト状態で
ない場合には、ホルト状態にない残余の処理装置(たと
え、それが1個しかない場合であっても)によりシステ
ムの処理の続行が図られ、何ら監視装置からは指示はな
い。
If all processors 11(x); i=1.2, ~ are not in the halt state, the remaining processors (even if there is only one) that are not in the halt state process the system. The operation continued, and there were no instructions from the monitoring equipment.

一方、第4図上段に示すように、前記制御テーブル23
内の表に全処理装置がホルト状態にあることを前記監視
処理部21が認識した場合には、ステップSJ3に進み
、前記ハード・アクセス部22により、各処理装置のリ
セット用レジスタに対して、第4図中段に示すような指
示を与え、当該ハード・アクセス部22は各処理装置に
対してリセット指示信号の書込みを行うことになる。
On the other hand, as shown in the upper part of FIG.
If the monitoring processing section 21 recognizes that all the processing devices are in the halt state in the table, the process proceeds to step SJ3, and the hard access section 22 performs the following operations on the reset register of each processing device. An instruction as shown in the middle part of FIG. 4 is given, and the hard access section 22 writes a reset instruction signal to each processing device.

その結果、第4図下段に示すように、各処理装置は立ち
上げられ、処理が続行され、前記制御テーブル23の表
は各処理装置のステータス・レジスタを反映してホルト
状態にはない旨の表示がなされていることになる。
As a result, as shown in the lower part of FIG. 4, each processing device is started up and processing continues, and the table of the control table 23 reflects the status register of each processing device, indicating that it is not in a halt state. It will be displayed.

〔発明の効果〕〔Effect of the invention〕

以上説明したように、本発明では複数の各処理装置に異
常が発生して正常な処理を続行することができない場合
には、当該処理装置を休止状態とし、当該処理装置が全
部休止状態にあることを検出した場合には、当該全処理
装置に対してリセット信号を送出して立ち上げるように
している。
As explained above, in the present invention, when an abnormality occurs in each of a plurality of processing devices and normal processing cannot be continued, the processing device is placed in a hibernation state, and all of the processing devices are in a hibernation state. When this is detected, a reset signal is sent to all the processing devices to start them up.

したがって、全装置が休止状態となって、システムの運
用が全くされない事態を防止し、信頼性のあるシステム
運用を図ることができることになる。
Therefore, it is possible to prevent a situation in which all devices are in a dormant state and the system is not operated at all, and to ensure reliable system operation.

【図面の簡単な説明】[Brief explanation of drawings]

第1図は本発明の原理ブロック図、第2図は実施例に係
る全体ブロック図、第3図は実施例に係る監視装置を示
す図、第4図は実施例に係る監視装置の機能説明図、第
5図は実施例に係る監視装置の処理流れ図及び第6図は
従来例に係るブロック図である。 1 (i) , 1 1 (i);i=1.2,〜n・
・・処理装置1(i) a (1 1(i) a) ;
i=1.2,〜n・・・休止命令発生部(ホルト命令発
生部) 2 (20)・・・全休止状態検出部(監視装置)3 
(20)・・・リセット信号送出部(監視装置)発4月
の原理ブロック圀 ill  図 実坊邑視に4沿る仝ル鯨ブロ,7/7図実兇府・j斥イ
糸ろ顎硯架實吃示吋圀 gs 3 因 実施4#J 1″:aる監視装置の丸理流れ例第 5w
Fig. 1 is a principle block diagram of the present invention, Fig. 2 is an overall block diagram according to an embodiment, Fig. 3 is a diagram showing a monitoring device according to an embodiment, and Fig. 4 is a functional explanation of a monitoring device according to an embodiment. 5 is a processing flowchart of a monitoring device according to an embodiment, and FIG. 6 is a block diagram of a conventional example. 1 (i), 1 1 (i); i=1.2, ~n・
...processing device 1(i) a (1 1(i) a);
i=1.2, ~n... Pause command generation section (halt command generation section) 2 (20)... Total pause state detection section (monitoring device) 3
(20)... Reset signal sending unit (monitoring device) emitted from April's principle block area ill Inkstone casing actual illustration 3 gs 3 cause implementation 4#J 1'': Example of the flow of a monitoring device No. 5w
J

Claims (1)

【特許請求の範囲】 複数の処理装置{1(i)、i=1、2〜n}を有する
システムに属する当該処理装置{1(i)}に発生した
異常に対し、正常な処理の続行を図る異常処理方式にお
いて、 前記処理装置{1(i)}に発生した異常により休止状
態とする休止命令発生部{1(i)a}を当該各処理装
置に設けるとともに、 全処理装置{1(i)}が休止状態にあるか否かを判別
する全休止状態検出部(2)と、 全処理装置が休止状態にあると判断した場合に前記全処
理装置に対してリセット信号を送出するリセット信号送
出部(3)とを有することを特徴とする異常回復処理方
式。
[Claims] Continuation of normal processing in response to an abnormality occurring in the processing device {1(i)} belonging to a system having a plurality of processing devices {1(i), i=1, 2 to n} In an abnormality processing method that aims at (i) an all-dormant-state detection unit (2) that determines whether or not } is in a dormant state; and an all-dormant state detection unit (2) that sends a reset signal to all the processing devices when it is determined that all the processing devices are in a dormant state; An abnormality recovery processing method characterized by comprising a reset signal sending section (3).
JP1228293A 1989-09-05 1989-09-05 Error recovery processing method Expired - Lifetime JP2844361B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP1228293A JP2844361B2 (en) 1989-09-05 1989-09-05 Error recovery processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP1228293A JP2844361B2 (en) 1989-09-05 1989-09-05 Error recovery processing method

Publications (2)

Publication Number Publication Date
JPH0391841A true JPH0391841A (en) 1991-04-17
JP2844361B2 JP2844361B2 (en) 1999-01-06

Family

ID=16874196

Family Applications (1)

Application Number Title Priority Date Filing Date
JP1228293A Expired - Lifetime JP2844361B2 (en) 1989-09-05 1989-09-05 Error recovery processing method

Country Status (1)

Country Link
JP (1) JP2844361B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001014290A (en) * 1999-06-28 2001-01-19 Fujitsu Ltd Multiprocessor system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001014290A (en) * 1999-06-28 2001-01-19 Fujitsu Ltd Multiprocessor system

Also Published As

Publication number Publication date
JP2844361B2 (en) 1999-01-06

Similar Documents

Publication Publication Date Title
US7716520B2 (en) Multi-CPU computer and method of restarting system
CN100440157C (en) System and method for logging recoverable errors
US7447934B2 (en) System and method for using hot plug configuration for PCI error recovery
US6516429B1 (en) Method and apparatus for run-time deconfiguration of a processor in a symmetrical multi-processing system
US6742139B1 (en) Service processor reset/reload
US6880113B2 (en) Conditional hardware scan dump data capture
JPS61502223A (en) Reconfigurable dual processor system
CN115061453A (en) Nuclear power plant DCS fault processing method and device, electronic equipment and storage medium
JPH09251443A (en) Information processing system processor failure recovery processing method
JP5440073B2 (en) Information processing apparatus, information processing apparatus control method, and control program
JP2000112790A (en) Computer with failure information collection function
CN115576734B (en) A multi-core heterogeneous log storage method and system
JPH0391841A (en) Abnormality recovery processing system
US7934128B2 (en) Methods, systems and computer program products for fault tolerant applications
JP2679575B2 (en) I/O channel fault handling system
JP2716537B2 (en) Down monitoring processing method in complex system
JP2002318643A (en) Information processing device
JP3340284B2 (en) Redundant system
JPS6112585B2 (en)
JPH0395634A (en) Restart control system for computer system
JPH01116739A (en) Monitor equipment for cpu
JP2008033598A (en) Dynamic substitution system, dynamic substitution method and program
JPH0149975B2 (en)
JPS5896353A (en) Malfunction detection device for information processing equipment
JPH09319603A (en) Inter-system monitoring system for multicomputer system

Legal Events

Date Code Title Description
FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20071030

Year of fee payment: 9

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20081030

Year of fee payment: 10

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20081030

Year of fee payment: 10

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20091030

Year of fee payment: 11

EXPY Cancellation because of completion of term