JPH02135533A - Fault processing system - Google Patents

Fault processing system

Info

Publication number
JPH02135533A
JPH02135533A JP63289170A JP28917088A JPH02135533A JP H02135533 A JPH02135533 A JP H02135533A JP 63289170 A JP63289170 A JP 63289170A JP 28917088 A JP28917088 A JP 28917088A JP H02135533 A JPH02135533 A JP H02135533A
Authority
JP
Japan
Prior art keywords
error
arithmetic processing
processing unit
fault
processor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP63289170A
Other languages
Japanese (ja)
Other versions
JPH0792763B2 (en
Inventor
Tatsuro Hashiguchi
橋口 達郎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Priority to JP63289170A priority Critical patent/JPH0792763B2/en
Publication of JPH02135533A publication Critical patent/JPH02135533A/en
Publication of JPH0792763B2 publication Critical patent/JPH0792763B2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Landscapes

  • Retry When Errors Occur (AREA)
  • Hardware Redundancy (AREA)
  • Debugging And Monitoring (AREA)

Abstract

PURPOSE:To prevent the system down of the whole of an information processing system when a correctable error occurs frequently by detecting the frequent occurrence of the correctable error of a control storing circuit, and when it is judged that the error occurs frequently, separating an arithmetic processing unit from a system with a processor relief. CONSTITUTION:A trouble processor 1 executes the trouble information collecting processing and the trouble relieving processing at the time of occurring of the fault of other device including an arithmetic processing unit 2. A control storing circuit 3 is a memory to store the microprogram of the arithmetic processing unit 2 and has an error correcting code. An error counter 10 counts the number of times of occurrence of the correctable error of the arithmetic processing unit 2 in a constant time, when the number of times set beforehand is reached, a fault processor control part 7 controls to suppress the reception of an error occurrence report by an error suppressing circuit 9. Simultaneously, the instruction stopping interruption through a communicating part 11 to a fault processor 2 is generated to a microprogram control part 6 and switched to a normal fault processor. In such a way, even if the correctable error occurs frequently, the shutdown of the system can be eliminated.

Description

【発明の詳細な説明】 良Δ欠里 本発明は障害処理方式に関し、特にエラー訂正可能な制
御記憶を有する演算処理装置において訂正可能エラーが
多発した場合の障害処理方式に関する。
DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a failure handling method, and more particularly to a failure handling method when correctable errors occur frequently in an arithmetic processing unit having an error-correctable control memory.

良困韮韮 従来、この種の情報処理装置は制御記憶にエラー訂正回
路を有しており、訂正可能なエラーが発生した場合にそ
れが多発したとしてもそのまま処理を続行するようにな
っている。またエラーが発生した場合、サービスプロセ
ッサ等によりエラー情報が収集されるが、訂正可能エラ
ーの多発によりこのエラー情報がオーバフローするため
、エラー受付抑止を行っている装置もある。
Conventionally, this type of information processing device has an error correction circuit in its control memory, so that when a correctable error occurs, it continues processing even if it occurs frequently. . Furthermore, when an error occurs, error information is collected by a service processor or the like, but this error information overflows due to frequent occurrence of correctable errors, so some devices suppress error acceptance.

上述した従来の情報処理装置では、制御記憶に訂正可能
なエラーが発生してもそのまま動作を続行するようにな
っているために、訂正可能なエラーが多発した場合に演
算処理装置の性能が低下するという欠点がある。また、
訂正可能なエラーが多発しているため、訂正可能エラー
が訂正不能なエラーとなっていよった場合、演算処理装
置の動作続行が不可能となり、本情報処理装置が運転停
止となってしまうという欠点がある。
In the conventional information processing device described above, even if a correctable error occurs in the control memory, the operation continues, so if a large number of correctable errors occur, the performance of the arithmetic processing device decreases. There is a drawback that it does. Also,
Because correctable errors occur frequently, if a correctable error turns into an uncorrectable error, the arithmetic processing unit will be unable to continue operating, and the information processing device will stop operating. There is.

場合によっては、訂正可能エラーであっても、制御記憶
内のマイクロプログラムのステップにより命令実行不能
となることもある。
In some cases, even a correctable error may cause a microprogram step in control storage to prevent instruction execution.

1肌座旦週 そこで、本発明はかかる従来技術の欠点を解決すべくな
されたものであって、その目的とするところは、訂正可
能エラーが多発した場合、情報処理システム全体のシス
テムダウンを防止するようにした障害処理方式を提供す
ることにある。
Therefore, the present invention was made to solve the drawbacks of the prior art, and its purpose is to prevent the entire information processing system from going down when correctable errors occur frequently. The object of the present invention is to provide a failure handling method that allows

九匪曵旦羞 本発明によれば、演算処理装置の障害発生時に障害情報
の収集及び障害救済処理を行う障害処理方式であって、
前記演算処理装置からの訂正可能エラーの発生報告毎に
この発生回数を計数する計数手段と、前記計数手段の計
数回数が一定時間内に予め定められた回数になったとき
に前記訂正可能エラーの発生報告の受付を抑止する抑止
手段と、前記回数になったときに前記演算処理装置への
割込みを発生させる割込み手段とを設け、前記演算処理
装置内のソフトウェアビジブルレジスタの内容を凍結さ
せて他の正常な演算処理装置へ前記レジスタの各内容を
移送制御するようにしたことを特徴とする障害処理方式
が得られる。
According to the present invention, there is provided a failure handling method that collects failure information and performs failure relief processing when a failure occurs in an arithmetic processing device,
a counting means for counting the number of occurrences of a correctable error each time the arithmetic processing unit reports the occurrence of the correctable error; A suppression means for suppressing reception of an occurrence report and an interrupt means for generating an interrupt to the arithmetic processing unit when the number of occurrences has been reached are provided, and the contents of a software visible register in the arithmetic processing unit are frozen. According to the present invention, there is obtained a fault handling method characterized in that the contents of each of the registers are controlled to be transferred to a normal arithmetic processing unit.

i旌ヨ 次に、本発明の実施例について図面を参照して説明する
Next, embodiments of the present invention will be described with reference to the drawings.

第1図は本発明の一実施例を示す構成図である。FIG. 1 is a block diagram showing an embodiment of the present invention.

図において、障害処理装置1は演算処理装置2を含むそ
の他の装置の障害発生時における障害情報収集処理及び
障害救済処理を行う。
In the figure, a failure processing device 1 performs failure information collection processing and failure relief processing when a failure occurs in other devices including an arithmetic processing device 2.

制御記憶回路3は演算処理装置2のマイクロプログラム
を格納するメモリであり、エラー訂正コードをも有して
いる。この制御記憶回路3から読出されたマイクロプロ
グラムデータはマイクロ命令読出しレジスタ5ヘセツト
されて指令を実行するが、それに先立ちエラー訂正回路
4においてデータの正当性チエツクが行われ、訂正可能
なエラーであれば訂正されたデータをマイクロ命令読出
しレジスタ5ヘセツトする。このときエラー訂正回F#
14は障害処理袋W1のエラー受付は回路8ヘ工ラー発
生通知を行う、エラー受付は回路8は障害処理制御部7
への割込みを発生する。尚、このエラー通知はエラー抑
止回路9により受付けを抑止することが可能となってい
る。
The control storage circuit 3 is a memory that stores a microprogram for the arithmetic processing unit 2, and also has an error correction code. The microprogram data read from the control storage circuit 3 is set in the microinstruction read register 5 to execute the command, but before that, the error correction circuit 4 checks the validity of the data, and if there is a correctable error, The corrected data is set in the microinstruction read register 5. At this time, error correction time F#
14 receives an error in the fault processing bag W1 and notifies the circuit 8 of the occurrence of an error.The circuit 8 receives the error.
Generates an interrupt. Note that acceptance of this error notification can be suppressed by the error suppression circuit 9.

エラーカウンタ10はエラー受付は回路8により受付け
られたエラー発生通知の回数を計数するものであり、あ
る一定時間毎にリセットされるものとする。よって、こ
のエラーカウンタ10は当該一定時間内における演算処
理装置2の訂正可能エラーの発生回数を計数することに
なる。このエラーカウンタ10による計数内容が予め設
定されている回数に達すると、障害処理装置制御部7は
エラー抑止回路9によりエラー発生報告の受付けを抑止
するよう制御する。同時に、通信部11を介して演算処
理装置2への命令停止割込みをマイクロプログラム制御
部6へ発生するのである。
The error counter 10 counts the number of error notifications accepted by the error reception circuit 8, and is reset at regular intervals. Therefore, this error counter 10 counts the number of times that a correctable error occurs in the arithmetic processing unit 2 within the certain period of time. When the count by the error counter 10 reaches a preset number of times, the failure processing device control unit 7 controls the error suppression circuit 9 to suppress acceptance of error occurrence reports. At the same time, an instruction stop interrupt to the arithmetic processing unit 2 is generated to the microprogram control unit 6 via the communication unit 11.

次に第2図の動作フローチャートを使用して全体的な動
作について説明する。障害処理装置制御部7は上記で説
明したエラー発生報告を受けると、カウンタ10を更新
してカウンタが予め決められた値Nをオーバしたかどう
かをチエツクする。オーバしていなければそのまま動作
を続行させる。
Next, the overall operation will be explained using the operation flowchart shown in FIG. When the failure processing device control unit 7 receives the error occurrence report described above, it updates the counter 10 and checks whether the counter exceeds a predetermined value N. If it is not over, the operation is continued as is.

尚、カウンタ10は第2図には図示していないが最初の
エラー発生から予め決められた時間経過したらリセット
されるものとする。カウンタ10がNをオーバした場合
は、エラー抑止回路9をセットし一旦エラー受付けを抑
止する。
Although the counter 10 is not shown in FIG. 2, it is assumed that the counter 10 is reset after a predetermined time has elapsed since the first error occurred. When the counter 10 exceeds N, the error suppression circuit 9 is set to temporarily suppress error acceptance.

次に演算処理装置2以外に演算処理装置が存在するかチ
エツクされ、存在しなければやはり動作を続行する。こ
の場合図示していないが、エラー抑止回路9はセット後
一定時間経過後解除され、エラー情報が収集される。こ
れにより保守者等への警告とする。
Next, a check is made to see if there is any arithmetic processing device other than the arithmetic processing device 2, and if there is no arithmetic processing device, the operation continues. In this case, although not shown, the error suppression circuit 9 is released after a certain period of time has elapsed after being set, and error information is collected. This serves as a warning to maintainers, etc.

次に他の演算処理装置が存在した場合について説明する
。この場合演算処理装置2に対し演算処理装置通信部1
1を経由しマイクロプログラム制御部へ命令停止割込み
を通知する。演算処理装置2のマイクロプログラムはソ
フトウェア命令を実行しており、該別込みを命令の切れ
目で受け、次命令の実行を抑止して障害処理装置1に応
答を返す、応答を受けた障害処理装置1は、演算処理装
置2のソフトウェアビジブルレジスタを診断インタフェ
ースを介して抜出し、他の正常な演算処理装置(図示せ
ず)へ移送する。以降の処理は周知の技術であるプロセ
ッサリリーフ処理の動作であり、ここでは特に説明しな
い。
Next, a case where another arithmetic processing device exists will be explained. In this case, the arithmetic processing unit communication section 1
1 to notify the microprogram control unit of an instruction stop interrupt. The microprogram of the arithmetic processing unit 2 is executing a software instruction, and the failure handling device that received the response receives the division at the break of the instruction, suppresses execution of the next instruction, and returns a response to the failure handling device 1. 1 extracts the software visible register of the arithmetic processing unit 2 via the diagnostic interface and transfers it to another normal arithmetic processing unit (not shown). The subsequent processing is the operation of processor relief processing, which is a well-known technique, and will not be particularly explained here.

九乳立ガ1 以上説明したように、本発明によれば、制御記憶回路の
訂正可能エラーの多発を検出し、多発と判断したときに
周知の技術であるプロセッサリリーフ処理を使用して演
算処理装置をシステムから切離すことを可能にすること
により、制御記憶の訂正可能エラー多発が訂正不能エラ
ーとなったり又エラーの発生しているマイクロプログラ
ムのステップによって命令実行不可となったりして、シ
ステムの運転を停止させてしまうということをなくすと
いう効果がある。
As explained above, according to the present invention, a frequent occurrence of correctable errors in a control memory circuit is detected, and when it is determined that a frequent occurrence of correctable errors occurs, arithmetic processing is performed using processor relief processing, which is a well-known technique. By making it possible to disconnect the device from the system, a large number of correctable errors in control memory can become uncorrectable errors, or instructions cannot be executed due to the microprogram step where the error occurs, and the system This has the effect of eliminating the need to stop operation.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は本発明の実施例のブロック図、第2図は第1図
のブロックの動作を示すフローチャートである。 主要部分の符号の説明 1・・・・・・障害処理装置 2・・・・・・演算処理装置 4・・・・・・エラー訂正回路 5・・・・・・マイクロ命令読出しレジスタ8・・・・
・・エラー受付は回路 9・・・・・・エラー抑止回路 10・・・・・・エラーカウンタ
FIG. 1 is a block diagram of an embodiment of the present invention, and FIG. 2 is a flow chart showing the operation of the blocks in FIG. Explanation of symbols of main parts 1...Fault processing unit 2...Arithmetic processing unit 4...Error correction circuit 5...Microinstruction read register 8...・・・
... Error reception circuit 9 ... Error suppression circuit 10 ... Error counter

Claims (1)

【特許請求の範囲】[Claims] (1)演算処理装置の障害発生時に障害情報の収集及び
障害救済処理を行う障害処理方式であって、前記演算処
理装置からの訂正可能エラーの発生報告毎にこの発生回
数を計数する計数手段と、前記計数手段の計数回数が一
定時間内に予め定められた回数になったときに前記訂正
可能エラーの発生報告の受付を抑止する抑止手段と、前
記回数になったときに前記演算処理装置への割込みを発
生させる割込み手段とを設け、前記演算処理装置内のソ
フトウェアビジブルレジスタの内容を凍結させて他の正
常な演算処理装置へ前記レジスタの各内容を移送制御す
るようにしたことを特徴とする障害処理方式。
(1) A fault handling method that collects fault information and performs fault relief processing when a fault occurs in an arithmetic processing unit, comprising a counting means that counts the number of occurrences of a correctable error every time the arithmetic processing unit reports the occurrence of a correctable error. , suppressing means for suppressing reception of the occurrence report of the correctable error when the counting number of the counting means reaches a predetermined number within a certain period of time; and an interrupt means for generating an interrupt, the contents of software visible registers in the arithmetic processing unit are frozen, and the contents of each of the registers are controlled to be transferred to another normal arithmetic processing unit. failure handling method.
JP63289170A 1988-11-16 1988-11-16 Fault handling method Expired - Fee Related JPH0792763B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP63289170A JPH0792763B2 (en) 1988-11-16 1988-11-16 Fault handling method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP63289170A JPH0792763B2 (en) 1988-11-16 1988-11-16 Fault handling method

Publications (2)

Publication Number Publication Date
JPH02135533A true JPH02135533A (en) 1990-05-24
JPH0792763B2 JPH0792763B2 (en) 1995-10-09

Family

ID=17739669

Family Applications (1)

Application Number Title Priority Date Filing Date
JP63289170A Expired - Fee Related JPH0792763B2 (en) 1988-11-16 1988-11-16 Fault handling method

Country Status (1)

Country Link
JP (1) JPH0792763B2 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04293130A (en) * 1991-03-20 1992-10-16 Nec Ibaraki Ltd Information processor
JPH0528005A (en) * 1991-07-19 1993-02-05 Nec Corp Malfunction detecting method
JPH09282191A (en) * 1996-04-12 1997-10-31 Nec Corp Fault processing system
JP2010170462A (en) * 2009-01-26 2010-08-05 Nec Computertechno Ltd Fault handling device and method
JP2012083992A (en) * 2010-10-13 2012-04-26 Nec Computertechno Ltd Data failure processing apparatus and data failure processing method
CN109801668A (en) * 2017-11-17 2019-05-24 慧荣科技股份有限公司 Data memory device and the operating method being applied thereon

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5739465A (en) * 1980-08-15 1982-03-04 Nippon Signal Co Ltd:The Multisystem computer device
JPS5785151A (en) * 1980-11-17 1982-05-27 Nec Corp Error recovery system of logical device
JPS57114954A (en) * 1981-01-05 1982-07-17 Nec Corp Error recovery system for logical device
JPS57137939A (en) * 1981-02-18 1982-08-25 Univ Kyoto Parallel counting and sorting method and its circuit
JPS5971551A (en) * 1982-10-18 1984-04-23 Nec Corp Information processor

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5739465A (en) * 1980-08-15 1982-03-04 Nippon Signal Co Ltd:The Multisystem computer device
JPS5785151A (en) * 1980-11-17 1982-05-27 Nec Corp Error recovery system of logical device
JPS57114954A (en) * 1981-01-05 1982-07-17 Nec Corp Error recovery system for logical device
JPS57137939A (en) * 1981-02-18 1982-08-25 Univ Kyoto Parallel counting and sorting method and its circuit
JPS5971551A (en) * 1982-10-18 1984-04-23 Nec Corp Information processor

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04293130A (en) * 1991-03-20 1992-10-16 Nec Ibaraki Ltd Information processor
JPH0528005A (en) * 1991-07-19 1993-02-05 Nec Corp Malfunction detecting method
JPH09282191A (en) * 1996-04-12 1997-10-31 Nec Corp Fault processing system
JP2010170462A (en) * 2009-01-26 2010-08-05 Nec Computertechno Ltd Fault handling device and method
JP2012083992A (en) * 2010-10-13 2012-04-26 Nec Computertechno Ltd Data failure processing apparatus and data failure processing method
CN109801668A (en) * 2017-11-17 2019-05-24 慧荣科技股份有限公司 Data memory device and the operating method being applied thereon
JP2019096281A (en) * 2017-11-17 2019-06-20 慧榮科技股▲分▼有限公司 Data storage device and associated operating method
US10915388B2 (en) 2017-11-17 2021-02-09 Silicon Motion, Inc. Data storage device and associated operating method capable of detecting errors and effectively protecting data

Also Published As

Publication number Publication date
JPH0792763B2 (en) 1995-10-09

Similar Documents

Publication Publication Date Title
WO2018103185A1 (en) Fault processing method, computer system, baseboard management controller and system
US20040019835A1 (en) System abstraction layer, processor abstraction layer, and operating system error handling
CN120723527B (en) Uncorrectable error processing method for bus equipment and server
CN114911659A (en) CE storm suppression method, device and related equipment
JPH02135533A (en) Fault processing system
US7346812B1 (en) Apparatus and method for implementing programmable levels of error severity
JPH03259349A (en) Fault recovering system
JP2870250B2 (en) Microprocessor runaway monitor
EP0113982B1 (en) A data processing system
CN115061776B (en) Virtual machine exception processing method, electronic equipment and storage medium
JPH0218506B2 (en)
JPH03265950A (en) 1-bit error processing system for control storage
JP2814988B2 (en) Failure handling method
JPS593638A (en) Information processor
JPS61226843A (en) Device for detecting interruption abnormality
JPS5971551A (en) Information processor
AU669410B2 (en) Error recovery mechanism for software visible registers in computer systems
JPH05100910A (en) Fault processing system
JPH0135369B2 (en)
JPH0365743A (en) Fault finding method for main storage device
JPS59128641A (en) Information processor
JPH03204739A (en) Microcomputer
JPH04148246A (en) Watchdog timer
JPH07111684B2 (en) Logical unit error recovery method
JPH0224731A (en) Error processing method

Legal Events

Date Code Title Description
LAPS Cancellation because of no payment of annual fees