JPS6126089B2

JPS6126089B2 -

Info

Publication number: JPS6126089B2
Application number: JP14035078A
Authority: JP
Inventors: Norimasa Kokatsu
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1978-11-14
Filing date: 1978-11-14
Publication date: 1986-06-19
Also published as: JPS5567850A

Description

[Detailed description of the invention]

本発明はパイプライン先行制御を行なう情報処
理装置に関する。情報処理装置の高速化として演算方式の改善、
緩衝記憶方式の採用または命令の実行と先行して
後続の命令の読出し、さらには後続命令のための
オペランドの読出しを行なう先行制御方式の採用
がなされている。先行制御方式の発展した１つの形態として、い
わゆるパイプライン制御方式がある。この方式と
して昭和48年４月20日オーム社発行の刊行物「計
算機システム技術」（元岡達編）第93ページ〜第
105ページを参照できる。パイプライン制御方式
を採用している情報処理装置では、命令の実行に
おいて、例えば、命令の取出し、命令の解読、ア
ドレス計算、緩衝記憶装置の読み出しおよび命令
実行動作はクロツクにより同期化され、論理ユニ
ツト（以下パイプライン制御段）のそれぞれの段
で別個に行なわれる。すなわち、あるパイプライ
ン制御段で所定の動作を終えると、後続の制御段
に動作結果を送り込むと同時に、前段のパイプラ
イン制御段から次の命令に関する入力情報を受け
取る。このようなパイプライン制御方式では、１
つの基本的な命令の実行時間は、その命令がパイ
プラインの第１制御段に入力されてから、最終制
御段で処理されるまでのクロツク数、すなわち、
パイプラインの段数で決まるのではなく実効的に
１つのパイプライン制御段を通り抜けるに要する
時間（１クロツク）に近いものとなる。上述のよ
なパイプライン制御方式の情報処理装置で、性能
向上を追求するとき、パイプライン制御段の１段
当りの性能を減らして１クロツクの所要時間を短
縮し、逆にパイプライン制御段数を増やすように
することもできる。一方分岐命令の場合 “命令取出し→命令デコード→アドレス計算→
そのアドレスによる緩衝記憶装置へのアクセス→
分岐先命令の取出し”からなる分岐動作の時間そ
のものを短縮する必要がある。パイプライン制御
方式を採用している場合、この分岐命令の実行時
間は、 The present invention relates to an information processing device that performs pipeline advance control. Improving calculation methods to speed up information processing equipment,
A buffer storage system or a advance control system is adopted in which a subsequent instruction is read out prior to execution of an instruction, and furthermore, operands for the subsequent instruction are read out. A so-called pipeline control method is an advanced form of the advance control method. This method was published in the publication ``Computer System Technology'' (edited by Tatsu Motooka) published by Ohmsha on April 20, 1971, pages 93 to 93.
You can refer to page 105. In an information processing device that employs a pipeline control method, during instruction execution, for example, instruction fetching, instruction decoding, address calculation, buffer storage device reading, and instruction execution operations are synchronized by a clock, and the logic unit (hereinafter referred to as pipeline control stage) is performed separately at each stage. That is, when a predetermined operation is completed in a certain pipeline control stage, the operation result is sent to the subsequent control stage, and at the same time, input information regarding the next instruction is received from the previous pipeline control stage. In such a pipeline control method, 1
The execution time of one basic instruction is the number of clocks from when the instruction enters the first control stage of the pipeline until it is processed by the final control stage, i.e.
It is not determined by the number of pipeline stages, but is close to the time (one clock) required to effectively pass through one pipeline control stage. When seeking to improve the performance of an information processing device using the pipeline control method described above, it is possible to reduce the performance per pipeline control stage to shorten the time required for one clock, and conversely to increase the number of pipeline control stages. You can also increase it. On the other hand, in the case of a branch instruction, “Instruction fetch → Instruction decode → Address calculation →
Accessing the buffer storage device by that address →
It is necessary to shorten the time of the branch operation itself, which consists of ``fetching the branch destination instruction.'' When a pipeline control method is adopted, the execution time of this branch instruction is

【表】〓数〓
で表わされ、第２項はあきらかにパイプラインの
段数に依存している。基本的な演算命令の高速化のために、クロツク
間隔の短縮に伴なつてパイプライン段数を増加す
ると、分岐動作もパイプライン制御段を歩進させ
るクロツクで同期化されているため分岐動作を実
行するに必要なパイプライン段数が増えて“同期
化損”が累積されることになり、前述の積で表わ
される分岐動作の実行時間はむしろ大きくなる。
ここでいう“同期化損”とは次のようなことから
発生する。 (1) パイプライン周期（クロツク）を決めると
き、パイプライン制御段を構成するもののう
ち、遅延時間が最大のものに設定することにな
り、その他のパイプライン制御段は、１周期の
クロツク時間に無駄な遊びができる。分岐動作
を実行するときには複数のパイプライン制御段
を介してその過程で、このような無駄な遊びが
あるパイプライン制御段を通過することにな
る。 (2) クロツクで同期化する場合、部品、負荷の特
性のばらつき等によるクロツク位相のずれが発
生しうるが、クロツク周期を決めるときその影
響を吸収する遅延時間の余裕を見込まなければ
ならない。また、パイプライン制御段毎のレジ
スタを構成するフリツプフロツプは基本的には
情報の保持機能しかなく、前述の分岐動作
（“命令取出し→……→分岐先命令の取出し”）
の論理的処理とは必須のものではない。このフ
リツプフロツプの設定保持用遅延時間は、クロ
ツク周期を決めるときの最大遅延時間に含ませ
なければならない。すなわち、ｎ段のパイプラ
イン制御段を通り抜けるとき、ｎ個のクロツク
位相のばらつきの保証の時間およびｎ個のフリ
ツプフロツプの設定保持用時間が加算される。以上のように、従来のパイプライン制御方式に
おいては、基本的命令の高速化のためにクロツク
周期を短縮しこれに伴つてパイプライン段数を増
加させると、分岐動作が遅くなるという欠点があ
る。本発明の目的は、情報処理装置の高速化のため
にパイプライン段数を増加させた場合に生じる分
岐動作に必要なクロツク数を減少させた情報処理
装置を提供することにある。本発明の装置は、外部からの命令を解読する命
令解読手段と、第１緩衝記憶回路および複数のパイプライン制
御段を有し前記命令解読手段で解読された命令が
分岐命令でないときには主記憶装置をアクセスす
るためのアドレスを生成し、このアドレスで指定
されるオペランドが前記第１緩衝記憶回路に格納
されているか否かの検索動作を前記複数のパイプ
ライン制御段で行なうオペランド先行制御手段
と、このオペランド先行制御段のパイプライン制御
段に接続され命令を実行する命令実行手段と、第２緩衝記憶回路を有し前記命令解読手段で解
読された命令が分岐命令であるときは前記オペラ
ンド先行制御手段とは独立して前記主記憶装置を
アクセスするためのアドレスを生成しこのアドレ
スで指定される分岐先命令が前記第２緩衝記憶回
路に格納されているか否かの検索動作を行なう分
岐制御手段と、この分岐先制御手段により前記分岐先命令が前
記第２緩衝記憶回路から検索されたときは前記命
令解読手段に前記分岐先命令を同期用クロツクに
無関係なタイミングで供給するとともに前記分岐
先命令が前記命令解読手段に供給されるまでの間
前記分岐命令を前記命令解読手段に保持するよう
に制御する手段とから構成されている。次に本発明について図面を参照して詳細に説明
する。第１図における本発明の命令デコード段は32ビ
ツトの命令レジスタ１、32ビツト×16語のインデ
ツクスレジスタ２、32ビツト×16語のベースレジ
スタ３、命令レジスタ１の命令コード（OP）を
解読する命令デコーダ４から構成されている。オ
ペランド先行制御ユニツト５は、ブロツク１１〜
４３から構成され、その内、アドレス計算制御段
は、レジスタ１１〜１５と、３入力の第１アドレ
ス加算器１６とから構成される。また、緩衝記憶
索引制御段はレジスタ２１，２２および２３と、
第１デイレクトリアレイ２４からなり、レジスタ
２３には、オペランドを指定する主記憶装置（図
示せず）のアドレスを保持する。緩衝記憶読出制御段は、レジスタ３１，３２お
よび３３と、第１緩衝記憶データ部３４から構成
され、データ転送制御段は、レジスタ４１，４２
および４３から構成されている。実行ユニツト
は、レジスタ５１，５２および５３，32ビツトの
演算器５５および32ビツト×16語の実行レジスタ
（ER）５４から構成されている。分岐先行制御ユ
ニツト６は３入力の第２アドレス加算器６０、第
２緩衝記憶回路の第２デイレクトリアレイ６１、
このアレイに対応した第２緩衝記憶データ部６
２、デイレクトリの比較とデータの選択とを行な
う命令ブロツク選択器６３、命令バツフアレジス
タ６４、命令選択器６５、プラス１機能を有する
32ビツトの命令アドレスカウンタ６６および命令
アドレス選択器６７から構成されている。本実施例における命令語は、第１図の命令レジ
スタ１内に示しているように、８ビツトの命令コ
ード部（OP）、４ビツトの実行レジスタ指定部Ｒ
１、主記憶装置上のオペランドのアドレスを推定
するための４ビツトのインデツクスレジスタ推定
部（×２）、４ビツトのベースレジスタ指定部Ｂ
２、12ビツトのデイスプレースメントＤ２から構
成されている。命令が命令レジスタ１に設定され
ると、命令レジスタ１のインデツクスレジスタ指
定部×２およびベースレジスタ指定部Ｂ２でそれ
ぞれインデツクスレジスタ２およびベースレジス
タ３に読み出される。同時に、命令レジスタ１に
格納された命令コード（OP）が命令デコーダ
（DCD）４で解読され、その結果非分岐関係の命
令（例えば演算レジスタＲ１の内容と、アドレス
（B2）＋（X2）＋D₂で指定される主記憶上のデータ
と演算して演算レジスタＲ１へ戻す命令）と解釈
されると、インデツクスレジスタ２およびベース
レジスタ３の読出し出力がレジスタ１４および１
５に送られ、デイスプレースメントD₂もレジス
タ１３に送られる。このとき、実行レジスタ
（ER）５４の指定をする実行レジスタ指定部Ｒ１
をレジスタ１２に、実行ユニツトでの演算の種類
の指定（例えば固定小数点加算）をデコーダ
（DCD）４に基づいて作成しレジスタ１１に格納
する。パイプラインで構成されているオペランド
先行制御ユニツト６のアドレス計算制御段のレジ
スタ１３，１４および１５に新しく指定された情
報は、第１アドレス加算器１６で加算され32ビツ
トの主記憶装置上のアドレスが作成され、パイプ
ラインの次の緩衝記憶索引制御段のレジスタ２３
に格納される。同時にレジスタ１１および１２の
内容がレジスタ２１および２２に格納される。レ
ジスタ２３に格納された主記憶装置上のアドレス
は、第１デイレクトリアレイ２４の入力となり、
第１緩衝記憶データ部３４のアドレスが作成さ
れ、緩衝記憶読出制御段のレジスタ３３に格納さ
れる。この時同時にレジスタ２１および２２の内
容はレジスタ３１および３２に格納される。パイ
プラインの緩衝記憶読出制御段ではレジスタ３３
の内容で第１緩衝記憶データ部３４の内容を読出
し、レジスタ３１，３２の内容をレジスタ４１，
４２へ送るのと同時に１語（32ビツト）のデータ
をデータ転送制御段のレジスタ４３に格納する。
パイプラインのデータ転送制御段は緩衝記憶部か
ら実行ユニツトへの、命令やデータの組合された
情報を転送する。実行ユニツトが次の命令を受け
つけ可能になつている状態において、レジスタ４
１，４２，４３の内容が実行ユニツトのレジスタ
５１，５２および５３は転送される。実行ユニツ
トでは、レジスタ５２で指定された番地の実行レ
ジスタ５４からのデータとレジスタ５３に格納さ
れているデータとを演算器５５でレジスタ５１で
指定された演算（例えば、固定小数点加算）し、
その結果をレジスタ５２で指定されている実行レ
ジスタ５４に戻して命令の実行を終了する。パイプラインの緩衝記憶索引制御段と緩衝記憶
読出制御段からなる第１緩衝記憶回路のより詳細
な動作を第２図を参照しながら説明する。第２図
で実現されている第１緩衝記憶回路は、主記憶装
置上にある主としてオペランドの内、比較的使用
頻度の高いブロツクを保持するように構成されて
いるセツトアソシアテイブ方式の緩衝記憶回路で
ある。セツトアソシアテイブ方式の緩衝記憶回路
の動作原理は「COMPUTER GROUP NEWS，
MARCH，1969“CONCEPTS FOR BUFFER
STORAGE”第９頁〜第13頁」を参照できる。
レジスタ２３は32ビツトの主記憶装置上のアドレ
スを格納する。このアドレスは３つの部分に分割されて緩衝記
憶索引段で挨われる。ビツト０〜２０はタグ番
号、ビツト２１〜２８はセツト番号およびビツト
２９〜３１は語番号と呼ぶ。第１デイレクトリア
レイ２４は、４つのアドレスアレイ２４１０〜２
４１３と１つのリプレイスアルゴリズム情報部２
４１４を有している。各アドレスアレイは21ビツ
トのタグ番号と１ビツトの有効表示ビツト保持し
ており、リプレイスアルゴリズム情報部２４１４
共々レジスタ２３のビツト２１〜２８でアドレス
される０〜２５５番地の記憶回路を有している。
すねわち、第１デイレクトリアレイ２４は、256
×４＝1024ブロツクの主記憶装置上のアドレス情
報を格納することができる。１ブロツクは８語で
あり、これが、主起憶装置のデータが、第１緩衝
記憶回路へ割当てられる単位であり、かつ、主記
憶装置から第１緩衝記憶回路への転送単位であ
る。レジスタ２３のセツト番号（ビツト２１〜２
８）の８ビツトで読み出されたアドレスアレイ２
４１０〜２４１３の出力の内のタグ番号の部分と
それぞれレジスタ２３のタグ番号（ビツト０〜２
０）とが比較器２４２０〜２４２３で比較され、
一致出力と各アドレスアレイの有効表示ビツトと
の論理積がアンドゲート２４３０〜２４３３でと
られる。レジスタ２３で指定された主記憶装置上
のアドレスに対応するブロツクが第１緩衝記憶回
路に保持されていると、これらの４つの出力の
内、１つが論理“１”他が論理“０”になり、そ
れはエンコーダ２４４で２ビツトのアドレスアレ
イ番号にコード化されて、レジスタ２３のセツト
番号や語番号とともにレジスタ３３に格納され
る。ゲート２４３０〜２４３３の出力が全て論理
“０”の場合オアゲート２４５の出力線２４５１
が論理“０”になり、レジスタ２３で指定された
主記憶アドレスに対応するデータブロツク（８
語）が、第１緩衝記憶回路３４に保持されていな
いことを表わしている。この場合、同時に第１デ
イレクトリアレイ２４から読まれているリプレイ
スアルゴリズム情報部２４１４の出力を使用し
て、４つのアドレスアレイの内の１つを選択し、
そこにレジスタ２３のタグ番号を登録し、有効表
示ビツトを論理“１”にしかつ対応する主記憶装
置上のデータを１ブロツク読出して、レジスタ３
３のビツト１９〜２８のブロツクへ書込まれる。
これらの動作の詳細は、セツトアソシアテイブ方
式の緩衝記憶回路の通常の動作であり、本発明の
本質には直接には関係しないので具体的な説明は
省略する。レジスタ３３に格納された情報は、理
解の便宜上、２ビツトのアドレスアレイ番号（ビ
ツト１９や２０）と８ビツトのセツト番号（ビツ
ト２１〜２８）からなる10ビツトのブロツク番号
とブロツク内の語番号（ビツト２９〜３１）から
なる。本実施例では、第１緩衝記憶データ部３４は13
ビツトのレジスタ３３でアクセスされる8192語の
データを有しており、レジスタ３３にアドレスが
与えられると次段のレジスタ４３に転送可能とな
るとき１語のデータを転送させる。次に、第１図にもどつて分岐先行制御ユニツト
６の動作について詳細に説明する。命令が命令レジスタ１に格納されると、命令デ
コーダ４で解読される。その結果、無条件分岐命
令（ベースレジスタ指定部Ｂ２の内容＋インデツ
クスレジスタ指定部Ｘ２の内容＋デイスプレイメ
ントＤ２で指定された番地に分岐）と解釈される
と、インデツクスレジスタ２の内容、ベースレジ
スタ３の内容および命令レジスタ１のデイスプレ
イスメントＤ２が３入力の第２アドレス加算器６
０に与えられる。第２アドレス加算器６０は、第
１アドレス加算器１６と同等の機能を有してお
り、出力線６０１に32ビツトからなる主記憶装置
上の分岐先アドレスを出力する。第３図におい
て、上記分岐先アドレス６０１は、第２図のレジ
スタ２３の内容を同様に、タグ番号、セツト番号
および語番号に分割して扱われている。第２デイ
レクトリアレイ６１は、第１図および第２図の第
１デイレクトリアレイ２４とほぼ同等の機能を有
しているが、第３図に示すように本実施例におい
てはセツト番号がビツト２１〜２７と１ビツト少
なく、逆に語番号がビツト２８〜３１と１ビツト
多くなつていて、ブロツクサイズを16語としてい
る。すなわち、第２デイレクトリアレイ６１、第
２緩衝記憶データ部６２および命令ブロツク選択
器６３で、128セツト×４ブロツク／セツト×16
語／ブロツクのセツトアソシアデイブ方式の第２
緩衝記憶回路を構成している。第２緩衝記憶回路は、主記憶装置上にある命令
の内、比較的使用頻度の高いブロツクを格納す
る。第３図における命令アドレスカウンタ６６
は、割込み発生時または、他の手段で初期値を設
定することができ、また分岐先アドレス６０１を
プラス１しながら入力したり、自分自身の値をプ
ラス１しながら入力に戻したりする機能を有して
いる。命令アドレス選択器６７では、命令デコー
ダで分岐命令を解読したとき分岐アドレス６０１
を選択して後述する命令バツフアレジスタ６４が
空になり、命令アドレスカウンタ６６を使つて第
２緩衝記憶回路をアクセスするときは＋１された
アドレス６６１を出力するように動作する。第２
緩衝記憶回路を構成する命令ブロツク選択器６３
の出力６３１には命令アドレス選択器６７の出力
によつてアクセスされた16語の１ブロツクが読み
出される。このとき、第２緩衝記憶回路に命令ア
ドレス選択器６７でアドレスされた命令ブロツク
がないとき、出力線６３２が論理“０”になつて
主記憶装置（図示せず）から該当命令ブロツクが
読み出され、第２緩衝記憶回路に格納される。出
力線６３１に出力されている16語のデータは、第
３選択器６５４に与えられ、命令アドレス選択器
６７のビツト２８〜３１の指定により、１語が抽
出され第１選択器６５を２を介して命令供給源６
５１に送出される。第２緩衝記憶回路の出力６３
１はまた16語の命令バツフアレジスタ６４にも与
えられ、そのとき読出されている命令群を格納さ
せる。命令が遂次処理中のとき命令バツフアレジ
スタ６４の出力16語は、第２選択器６５３に与え
られ、命令アドレスカウンタ６６のビツト２８〜
３１で指定される１語が抽出され、第１選択器６
５２を介して命令供給源６５１に出力されて命令
レジスタ１にセツトされる。このとき同時に命令
アドレスカウンタ６６はプラス１動作が行なわれ
る。プラス１の結果命令アドレスカウンタ６６の
ビツト２８〜３１がすべて“０”になると、命令
バツフアレジスタ６４のうちの命令がすべて処理
し尽くされたことになるので、命令アドレスカウ
ンタ６６の出力６６１を命令アドレス選択器６７
で選択し第２緩衝記憶回路をアクセスし、分岐時
の動作と同様に命令バツフアレジスタ６４、第
１、第２および第３選択器６５２，６５３および
６５４を動作させる。分岐先行制御ユニツトの以上の説明より明らか
なように、例えば無条件分岐命令が命令レジスタ
１にセツトされると、インデツクスレジスタ２お
よびベースレジスタ３の内容が読出され、命令の
デイスプレースメント部Ｄ２とともに、第２アド
レス加算器６０に送られて分岐先命令のアドレス
が作られる。そのアドレスが命令アドレス選択器
６７で出力に選択され、第２緩衝記憶回路６１，
６２および６３をアクセスして16語の命令ブロツ
ク６３１が読出され、第３選択器６５４および第
１選択器６５２を介して命令レジスタ１に送られ
るまでの間、クロツクで同期化されていない。す
なわち、分岐命令が命令レジスタ１に設定されて
から分岐先命令が命令供給源６５１に出力される
までの動作に関する論理回路網は分岐先命令を含
むブロツクが第２緩衝記憶回路に存在するとき、
組合せ回路のみで構成される。上述のように、本発明においては、分岐命令が
命令デコード段で解読されてから分岐先命令の読
出しにあたり、一切の同期化がなされていないの
で従来のパイプライン制御方式を用いた装置に比
べてより高速の分岐動作が可能となつている。本発明は、実施例のパイプラインの段数に限定
されることなく、また、仮想記憶アドレスから主
記憶アドレスへの変換機能を有する場合も適用可
能である。さらに本実施例ではセツトアソシアテ
イブ方式を前提として説明したが本発明はこれに
限定されることなく、例えばフルアソシアテイブ
方式のような緩衝記憶方式に適用できることは明
らかである。本発明には、分岐先行制御ユニツトをもうけ分
岐先命令をクロツクでで同期化することなしに読
出すように構成することにより最短の遅延時間で
分岐命令を実行できるという効果がある。[Table] 〓Number〓
The second term clearly depends on the number of pipeline stages. In order to speed up basic arithmetic instructions, when the number of pipeline stages is increased as the clock interval is shortened, branch operations are also executed because they are synchronized with the clock that advances the pipeline control stage. As a result, the number of pipeline stages required increases and "synchronization loss" is accumulated, and the execution time of the branch operation expressed by the above-mentioned product actually becomes longer.
The "synchronization loss" mentioned here arises from the following reasons. (1) When determining the pipeline period (clock), the one with the maximum delay time is set among the pipeline control stages, and the other pipeline control stages are set according to the clock time of one cycle. You can play pointlessly. When a branch operation is executed, a plurality of pipeline control stages are used, and in the process, a pipeline control stage with such wasteful play is passed through. (2) When synchronizing with a clock, clock phase shifts may occur due to variations in component and load characteristics, etc., but when determining the clock cycle, it is necessary to allow for a margin of delay time to absorb this effect. In addition, the flip-flops that constitute the registers of each pipeline control stage basically only have the function of holding information, and the above-mentioned branch operation ("fetching an instruction →...→fetching a branch destination instruction")
Logical processing is not essential. This flip-flop setting holding delay time must be included in the maximum delay time when determining the clock cycle. That is, when passing through n pipeline control stages, n clock phase variation guarantee times and n flip-flop setting holding times are added. As described above, in the conventional pipeline control system, when the clock period is shortened in order to speed up basic instructions and the number of pipeline stages is increased accordingly, there is a drawback that the branch operation becomes slower. SUMMARY OF THE INVENTION An object of the present invention is to provide an information processing apparatus in which the number of clocks required for branch operations that occur when the number of pipeline stages is increased to increase the speed of the information processing apparatus. The apparatus of the present invention includes an instruction decoding means for decoding an external instruction, a first buffer memory circuit, and a plurality of pipeline control stages, and when the instruction decoded by the instruction decoding means is not a branch instruction, the apparatus is provided with a main memory. operand advance control means for generating an address for accessing the address and performing a search operation in the plurality of pipeline control stages to determine whether an operand specified by the address is stored in the first buffer storage circuit; an instruction execution means connected to the pipeline control stage of the operand advance control stage to execute an instruction; and a second buffer memory circuit, and when the instruction decoded by the instruction decoding means is a branch instruction, the operand advance control Branch control means that generates an address for accessing the main memory device independently of the means and performs a search operation to determine whether a branch destination instruction specified by this address is stored in the second buffer memory circuit. When the branch destination instruction is retrieved from the second buffer storage circuit by the branch destination control means, the branch destination instruction is supplied to the instruction decoding means at a timing unrelated to the synchronization clock, and the branch destination instruction is and means for controlling the branch instruction to be held in the instruction decoding means until the branch instruction is supplied to the instruction decoding means. Next, the present invention will be explained in detail with reference to the drawings. The instruction decode stage of the present invention in FIG. 1 decodes the instruction code (OP) of the 32-bit instruction register 1, the 32-bit x 16-word index register 2, the 32-bit x 16-word base register 3, and the instruction register 1. The instruction decoder 4 includes an instruction decoder 4. The operand advance control unit 5 controls blocks 11 to
43, of which the address calculation control stage is comprised of registers 11 to 15 and a three-input first address adder 16. The buffer storage index control stage also includes registers 21, 22 and 23,
It consists of a first directory array 24, and a register 23 holds an address of a main memory (not shown) that specifies an operand. The buffer memory read control stage is composed of registers 31, 32, and 33, and the first buffer memory data section 34, and the data transfer control stage is composed of registers 41, 42, and 34.
and 43. The execution unit is composed of registers 51, 52 and 53, a 32-bit arithmetic unit 55, and an execution register (ER) 54 of 32 bits x 16 words. The branch advance control unit 6 includes a 3-input second address adder 60, a second directory array 61 of a second buffer memory circuit,
Second buffer storage data section 6 corresponding to this array
2. Has an instruction block selector 63 for comparing directories and selecting data, an instruction buffer register 64, an instruction selector 65, and a plus 1 function.
It consists of a 32-bit instruction address counter 66 and an instruction address selector 67. The instruction word in this embodiment consists of an 8-bit instruction code part (OP) and a 4-bit execution register designation part R, as shown in the instruction register 1 in FIG.
1. 4-bit index register estimation section (x2) for estimating the address of the operand on the main memory, 4-bit base register specification section B
It consists of 2.12-bit displacement D2. When an instruction is set in the instruction register 1, it is read out to the index register 2 and base register 3 by the index register designation section x2 and the base register designation section B2 of the instruction register 1, respectively. At the same time, the instruction code (OP) stored in the instruction register 1 is decoded by the instruction decoder (DCD) 4, and as a result, a non-branch related instruction (for example, the contents of the operation register R1 and the address (B2) + (X2) + D ₂₎ , the read outputs of index register 2 and base register 3 are read out from registers 14 and 1.
5 and displacement D ₂ is also sent to register 13. At this time, the execution register specification section R1 specifies the execution register (ER) 54.
A designation of the type of operation in the execution unit (for example, fixed-point addition) is created based on the decoder (DCD) 4 and stored in the register 11. The information newly specified in the registers 13, 14, and 15 of the address calculation control stage of the operand advance control unit 6, which is composed of a pipeline, is added by the first address adder 16 and becomes an address on the 32-bit main memory. is created and stored in register 23 of the next buffer storage index control stage in the pipeline.
is stored in At the same time, the contents of registers 11 and 12 are stored in registers 21 and 22. The address on the main memory stored in the register 23 becomes an input to the first directory array 24,
The address of the first buffer storage data section 34 is created and stored in the register 33 of the buffer storage read control stage. At this time, the contents of registers 21 and 22 are simultaneously stored in registers 31 and 32. In the buffer memory read control stage of the pipeline, register 33
The contents of the first buffer storage data section 34 are read with the contents of the registers 41 and 32, and the contents of the registers 31 and 32 are
At the same time, one word (32 bits) of data is stored in the register 43 of the data transfer control stage.
The data transfer control stage of the pipeline transfers combined instruction and data information from the buffer to the execution unit. When the execution unit is ready to accept the next instruction, register 4
The contents of registers 1, 42, and 43 of the execution unit are transferred to registers 51, 52, and 53. In the execution unit, the data from the execution register 54 at the address specified by the register 52 and the data stored in the register 53 are subjected to an operation (for example, fixed-point addition) specified by the register 51 in the arithmetic unit 55.
The result is returned to the execution register 54 specified by the register 52, and execution of the instruction is completed. A more detailed operation of the first buffer memory circuit consisting of a buffer memory index control stage and a buffer memory read control stage of the pipeline will be explained with reference to FIG. The first buffer memory circuit realized in FIG. 2 is a set associative type buffer memory configured to hold relatively frequently used blocks mainly among operands in the main memory. It is a circuit. The operating principle of the set associative buffer memory circuit is as follows:
MARCH, 1969 “CONCEPTS FOR BUFFER”
You can refer to "STORAGE" pages 9 to 13.
Register 23 stores a 32-bit address on main memory. This address is divided into three parts and sent to the buffer lookup stage. Bits 0-20 are called the tag number, bits 21-28 are called the set number, and bits 29-31 are called the word number. The first directory array 24 includes four address arrays 2410 to 2410.
413 and one replacement algorithm information section 2
414. Each address array holds a 21-bit tag number and 1 valid display bit, and a replacement algorithm information section 2414.
Both have memory circuits for addresses 0 to 255 addressed by bits 21 to 28 of register 23.
The first directory array 24 is 256
×4=1024 blocks of address information on the main memory can be stored. One block consists of eight words, which is the unit in which data in the main memory is allocated to the first buffer memory circuit, and is also the unit in which data is transferred from the main memory to the first buffer memory circuit. Set number of register 23 (bits 21-2
8) Address array 2 read out with 8 bits
The tag number part of the outputs 410 to 2413 and the tag number part of the register 23 (bits 0 to 2)
0) are compared by comparators 2420 to 2423,
The match output is ANDed with the valid indicating bit of each address array in AND gates 2430-2433. When the block corresponding to the address on the main memory specified by the register 23 is held in the first buffer memory circuit, one of these four outputs becomes logic "1" and the others become logic "0". This is encoded into a 2-bit address array number by the encoder 244 and stored in the register 33 together with the set number and word number in the register 23. When the outputs of gates 2430 to 2433 are all logic "0", output line 2451 of OR gate 245
becomes logic “0”, and the data block (8
(word) is not held in the first buffer memory circuit 34. In this case, one of the four address arrays is selected using the output of the replace algorithm information unit 2414 that is simultaneously read from the first directory array 24;
Register the tag number of register 23 there, set the valid display bit to logic "1", read one block of data from the corresponding main memory, and register 3.
3, bits 19-28 are written to the block.
The details of these operations are normal operations of a set associative type buffer memory circuit, and are not directly related to the essence of the present invention, so a detailed explanation will be omitted. For convenience of understanding, the information stored in the register 33 consists of a 10-bit block number consisting of a 2-bit address array number (bits 19 and 20) and an 8-bit set number (bits 21 to 28), and a word number within the block. (bits 29 to 31). In this embodiment, the first buffer storage data section 34 has 13
It has 8192 words of data accessed by the bit register 33, and when an address is given to the register 33, one word of data is transferred to the next stage register 43 when it becomes transferable. Next, returning to FIG. 1, the operation of the branch advance control unit 6 will be explained in detail. When an instruction is stored in the instruction register 1, it is decoded by an instruction decoder 4. As a result, if it is interpreted as an unconditional branch instruction (contents of base register specification section B2 + contents of index register specification section X2 + branch to the address specified by displacement D2), the contents of index register 2, base A second address adder 6 with three inputs, the contents of register 3 and displacement D2 of instruction register 1.
given to 0. The second address adder 60 has the same function as the first address adder 16, and outputs a 32-bit branch destination address on the main memory to an output line 601. In FIG. 3, the branch destination address 601 is handled by dividing the contents of the register 23 in FIG. 2 into a tag number, a set number, and a word number. The second directory array 61 has almost the same function as the first directory array 24 shown in FIGS. 1 and 2, but as shown in FIG. 27, which is one bit less, and conversely, the word number is one bit more, from bits 28 to 31, and the block size is 16 words. That is, the second directory array 61, second buffer storage data section 62, and instruction block selector 63 provide 128 sets x 4 blocks/set x 16
Word/Block Set Associa Dave Method 2nd
It constitutes a buffer memory circuit. The second buffer memory circuit stores relatively frequently used blocks of instructions on the main memory. Instruction address counter 66 in FIG.
can set the initial value when an interrupt occurs or by other means, and also has the function of inputting the branch destination address 601 while adding 1, or returning it to the input while adding 1 to its own value. have. The instruction address selector 67 selects the branch address 601 when the instruction decoder decodes the branch instruction.
When the second buffer memory circuit is accessed using the instruction address counter 66, the instruction buffer register 64, which will be described later, becomes empty and the address 661 incremented by 1 is output. Second
Instruction block selector 63 forming a buffer memory circuit
One block of 16 words accessed by the output of the instruction address selector 67 is read out from the output 631 of the instruction address selector 67. At this time, if there is no instruction block addressed by the instruction address selector 67 in the second buffer memory circuit, the output line 632 becomes logic "0" and the corresponding instruction block is read out from the main memory (not shown). and stored in the second buffer memory circuit. The 16 words of data output to the output line 631 are given to the third selector 654, and according to the designation of bits 28 to 31 of the instruction address selector 67, one word is extracted and the first selector 65 is set to 2. Instruction source 6 through
51. Output 63 of the second buffer memory circuit
1 is also applied to a 16-word instruction buffer register 64 to store the group of instructions being read at the time. When instructions are being sequentially processed, the 16 words output from the instruction buffer register 64 are given to the second selector 653, and bits 28 to 28 of the instruction address counter 66 are sent to the second selector 653.
31 is extracted, and the first selector 6
52 to the instruction supply source 651 and set in the instruction register 1. At the same time, the instruction address counter 66 is incremented by one. When bits 28 to 31 of the instruction address counter 66 all become "0" as a result of plus 1, it means that all the instructions in the instruction buffer register 64 have been processed, so the output 661 of the instruction address counter 66 is Instruction address selector 67
, the second buffer memory circuit is accessed, and the instruction buffer register 64, first, second, and third selectors 652, 653, and 654 are operated in the same manner as in the branch operation. As is clear from the above description of the branch advance control unit, for example, when an unconditional branch instruction is set in instruction register 1, the contents of index register 2 and base register 3 are read out, and the instruction displacement section D2 is read out. At the same time, it is sent to the second address adder 60 to create the address of the branch destination instruction. The address is selected as an output by the instruction address selector 67, and the second buffer memory circuit 61,
62 and 63 are accessed to read out the 16-word instruction block 631 and it is not synchronized with the clock until it is sent to the instruction register 1 via the third selector 654 and the first selector 652. That is, the logic circuit network related to the operation from when the branch instruction is set in the instruction register 1 until the branch destination instruction is output to the instruction supply source 651 is as follows when a block including the branch destination instruction exists in the second buffer memory circuit.
Consists of only combinational circuits. As described above, in the present invention, there is no synchronization at all when reading a branch destination instruction after a branch instruction is decoded at the instruction decode stage, so the system is faster than a device using a conventional pipeline control method. Faster branching operations are now possible. The present invention is not limited to the number of pipeline stages in the embodiment, and can also be applied to cases where a conversion function from a virtual memory address to a main memory address is provided. Further, although the present embodiment has been described based on the set associative method, the present invention is not limited thereto, but is obviously applicable to buffer storage methods such as the full associative method. The present invention has the advantage that a branch instruction can be executed with the shortest delay time by providing a branch advance control unit and configuring the branch destination instruction to be read without synchronizing with a clock.

[Brief explanation of the drawing]

第１図は本発明の一実施例を示す図、第２図は
第１図に示した第１緩衝記憶回路の部分のより詳
細な図および第３図は第１図に示した分岐先行制
御ユニツトのより詳細な図である。第１図から第３図において、１……命令レジス
タ、２……インデツクスレジスタ、３……ベース
レジスタ、４……命令デコーダ、５……オペラン
ド先行制御ユニツト、６……分岐先行制御ユニツ
ト、１１，１２，１３，１４，１５，２１，２
２，２３，３１，３２，３３，４１，４２，４
３，５１，５２，５３……レジスタ、１６……第
１アドレス加算器、２４……第１デイレクトリア
レイ、３４……第１緩衝記憶データ部、５４……
実行レジスタ、５５……演算器、６０……第２入
力アドレス加算器、６１……第２デイレクトリア
レイ、６２……第２緩衝記憶データ部、６３……
命令ブロツク選択器、６４……命令バツフアレジ
スタ、６５……命令選択器、６６……命令アドレ
スカウンタ、６７……命令アドレス選択器、２４
１０〜２４１４……メモリ素子、２４２０〜２４
３０……比較器、２４３０〜２４３３……アンド
ゲート、２４４……４入力２出力エンコーダ、２
４５……オア回路、６５２，６５３，６５４……
第１、第２、第３選択器。 FIG. 1 is a diagram showing one embodiment of the present invention, FIG. 2 is a more detailed diagram of the first buffer memory circuit shown in FIG. 1, and FIG. 3 is a branch advance control diagram shown in FIG. 1. Figure 3 is a more detailed diagram of the unit. 1 to 3, 1...instruction register, 2...index register, 3...base register, 4...instruction decoder, 5...operand advance control unit, 6...branch advance control unit, 11, 12, 13, 14, 15, 21, 2
2, 23, 31, 32, 33, 41, 42, 4
3, 51, 52, 53...Register, 16...First address adder, 24...First directory array, 34...First buffer storage data section, 54...
Execution register, 55... Arithmetic unit, 60... Second input address adder, 61... Second directory array, 62... Second buffer storage data section, 63...
Instruction block selector, 64... Instruction buffer register, 65... Instruction selector, 66... Instruction address counter, 67... Instruction address selector, 24
10-2414...Memory element, 2420-24
30... Comparator, 2430-2433... AND gate, 244... 4 input 2 output encoder, 2
45...OR circuit, 652, 653, 654...
First, second and third selectors.

Claims

[Claims] 1. Command decoding means for decoding commands from the outside;
It has a first buffer storage circuit and a plurality of pipeline control stages, and when the instruction decoded by the instruction decoding means is not a branch instruction, it generates an address for accessing the main memory, and an operand specified by this address. operand advance control means that performs a search operation to determine whether or not is stored in the first buffer storage circuit in the plurality of pipeline control stages; and a second buffer memory circuit for accessing the main memory independently of the operand advance control means when the instruction decoded by the instruction decoding means is a branch instruction. branch destination control means for generating an address and performing a search operation to determine whether or not a branch destination instruction specified by the address is stored in the second buffer memory circuit; When retrieved from the second buffer memory circuit, the branch destination instruction is supplied to the instruction decoding means at a timing unrelated to the synchronization clock, and the branch destination instruction is supplied to the instruction decoding means until the branch destination instruction is supplied to the instruction decoding means. and means for controlling the command decoding means to hold the command.