JPH01241645A

JPH01241645A - Arithmetic processing unit

Info

Publication number: JPH01241645A
Application number: JP63069055A
Authority: JP
Inventors: Yoichi Sato; 洋一佐藤
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1988-03-23
Filing date: 1988-03-23
Publication date: 1989-09-26
Anticipated expiration: 2010-03-06
Also published as: JPH0719224B2

Abstract

PURPOSE:To attain the high speed access of a cache memory by directly connecting between the cache memory and an LSI chip with a chip having cross bar switch function in a data bus for the reading/writing of the cache memory. CONSTITUTION:Cache memories 83 and 84 are divided into >=2 banks and the cache memories 83 and 84 and more than 2 LSI chips are connected through the chip to have a cross bar switch 70 which can obtain a connecting condition between arbitrary input and output terminals. Then, data is transferred simultaneously through the chip having the cross bar switch 70 between the >=2 different LSI chips and the >=2 different banks of the cache memories 83 and 84. Thus, through the chip having the cross bar switch 70, the data is transferred simultaneously between the divided banks of the cache memories 83 and 84 and the different LSI chips. Thus, the efficiency of the data transfer can be improved.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は情報処理装置の一部を構成する演算処理装置に
関し、特にキャッシュ・メモリと複数のＬＳＩチップと
で構成される演算処理装置におけるキャッシュ・メモリ
とＬＳＩチップとの間のデータ転送にかかる技術に関す
るものである。Detailed Description of the Invention [Field of Industrial Application] The present invention relates to an arithmetic processing unit that constitutes a part of an information processing device, and in particular to a cache in an arithmetic processing unit that is composed of a cache memory and a plurality of LSI chips. -Relates to technology related to data transfer between memory and LSI chips.

[Conventional technology]

近年、電子デバイスの集積化の進歩が著しく、高性能の
演算処理装置も数個のＬＳＩチップで実現されるように
なってきた。In recent years, there has been remarkable progress in the integration of electronic devices, and high-performance arithmetic processing units have come to be realized using several LSI chips.

ところで、このような高性能の演算処理装置では、処理
の一層の高速化を図る目的でキャッシュ・メモリが採用
されるが、ＬＳＩチップが複数個の場合はキャッシュ・
メモリの続出し先や書込み元が複数のＬＳＩチップにま
たがることになり、個々にデータ・パスを設けるとキャ
ッシュ・メモリのピン数が膨大となってしまうことから
、−Ｓにはデータ・パスをパス化して各ＬＳＩチップで
共通利用し、ピン数制限におさまるようにしている。Incidentally, such high-performance arithmetic processing devices employ cache memory for the purpose of further speeding up processing, but when there are multiple LSI chips, cache memory
The memory output destination and write source will span multiple LSI chips, and if each data path is provided, the number of cache memory pins will become enormous. Therefore, a data path is not provided for -S. It is made into a path and used commonly by each LSI chip, so that the number of pins is within the limit.

[Problem to be solved by the invention]

上述したように、従来の演算処理装置は、キャッシュ°
メモリとのアクセスのためのデータ・ノマスをバス化す
ることにより、キャッシュ°メモリのピン数を少なくし
ていた。しかしながら、■バスに接続されるＬＳＩチッ
プ数が多くなるとバスの線長が長（なり、静電容量の増
大によりバス上の信号の遅延時間が増大してキャッシュ
・メモリの高速なアクセスが行えない。As mentioned above, conventional arithmetic processing units have a cache
The number of pins in the cache memory was reduced by making the data node for accessing the memory a bus. However, as the number of LSI chips connected to the bus increases, the line length of the bus becomes longer, and the delay time of signals on the bus increases due to the increase in capacitance, making it impossible to access the cache memory at high speed. .

■バス方式であるためｌサイクルで１つのデータ転送し
かできない。■Since it is a bus method, only one data transfer is possible in one cycle.

等の欠点があった。There were other drawbacks.

特に、キャッシュ・メモリのアクセスをパイプライン化
している演算処理装置にあっては、キャッシュ・メモリ
の読出し時間の増大はマシン・サイクルの短縮化を阻む
直接的な要因となることから、演算処理装置の性能を低
下させることとなり、■についての対策は重要な問題で
あった。また、■についてもデータ転送の効率を上げる
うえで重要な問題であった。In particular, in arithmetic processing units that pipeline cache memory accesses, an increase in cache memory read time is a direct factor that hinders shortening of machine cycles. Therefore, countermeasures for (2) were an important issue. In addition, (2) was also an important problem in improving the efficiency of data transfer.

本発明は上記の点に鑑み提案されたものであり、その目
的とするところは、高速なキャッシュ・メモリのアクセ
スを行うことができると共に、同時に２つ以上のデータ
転送を可能としてデータ転送の効率を高めることのでき
る演算処理装置を提供することにある。The present invention has been proposed in view of the above points, and its purpose is to enable high-speed cache memory access and to improve the efficiency of data transfer by enabling two or more data transfers at the same time. The object of the present invention is to provide an arithmetic processing device that can increase the performance.

（課題を解決するための手段〕本発明は上記の目的を達成するため、キャッシュ・メモ
リと複数のＬＳＩチップとから構成され、前記キャッシ
ュ・メモリと２個以上の前記ＬＳＩチップとの間でデー
タ転送が行われる演算処理装置において、前記キャッシ
ュ・メモリを２個以上のバンクに分割すると共に、任意
の入出力端子間を接続状態とできるクロス・バー・スイ
・ノチ機能を有するチップを介して前記キャッシュ・メ
モリと２個以上の前記ＬＳＩチップとを接続し、前記ク
ロス・バー・スイッチ機能を有するチ・ノブを介して２
個以上の異なる前記ＬＳＩチップと前記キャッシュ・メ
モリの２個以上の異なるバンクとの間で同時にデータ転
送を行うようにしている。(Means for Solving the Problems) In order to achieve the above object, the present invention includes a cache memory and a plurality of LSI chips, and the present invention includes a cache memory and a plurality of LSI chips. In the arithmetic processing unit where the transfer is performed, the cache memory is divided into two or more banks, and the data is transferred via a chip having a cross bar switch function that can connect arbitrary input/output terminals. The cache memory and two or more of the LSI chips are connected through the chi knob having the cross bar switch function.
Data transfer is simultaneously performed between two or more different LSI chips and two or more different banks of the cache memory.

[Effect]

本発明の演算処理装置にあっては、クロス・バー・スイ
ッチ機能を有するチップを介し、キャッシュ・メモリの
分割されたバンクと、異なるＬＳＩチップとの間で同時
にデータ転送が行われる。In the arithmetic processing device of the present invention, data is simultaneously transferred between divided banks of the cache memory and different LSI chips via a chip having a cross bar switch function.

〔Example〕

以下、本発明の実施例につき図面を参照して詳細に説明
する。Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

第１図は本発明の演算処理装置を含む情報処理装置の一
実施例を示す構成図である。第１図において、９０が本
発明の対象となる演算処理装置であり、この演算処理装
置９０はシステム・バス９４を介して主記憶装置９１．
入出力制御装置９２゜システム制御装置９３と接続され
ている。なお、第１図では示していないが、マルチプロ
セッサ構成においては他に数台の演算処理装置をシステ
ム・バス９４に接続し、更に主記憶容量の増大時には主
記憶装置を複数台にしてシステム・バス９４に接続する
ものである。FIG. 1 is a configuration diagram showing an embodiment of an information processing device including an arithmetic processing device of the present invention. In FIG. 1, reference numeral 90 is an arithmetic processing unit to which the present invention is applied, and this arithmetic processing unit 90 is connected via a system bus 94 to a main memory 91.
Input/output control device 92° is connected to system control device 93. Although not shown in FIG. 1, in a multiprocessor configuration, several other arithmetic processing units may be connected to the system bus 94, and when the main memory capacity is increased, multiple main memory devices may be connected to the system bus 94. It is connected to bus 94.

また、演算処理装置９０は、命令制御回路１０゜アドレ
ス変換制御回路２０．バス制御回路３０゜演算制御回路
４０．高速演算回路５０．制御記憶回路６０を構成する
各ＬＳＩチップと、複数個のランダム・アクセス・メモ
リ（ＲＡＭ）から構成される制御記憶８５と、キャッシ
ュ・メモリ８３゜８４と、アドレス・アレイ　（ＡＡ）
８１と、コピー・アドレス・アレイ　（ＣＡＡ）８２と
、複数個のＬＳＩチップから構成されるクロス・バー・
スイッチ７０とで構成されている。The arithmetic processing unit 90 also includes an instruction control circuit 10.address conversion control circuit 20. Bus control circuit 30° arithmetic control circuit 40. High-speed arithmetic circuit 50. Each LSI chip constituting the control memory circuit 60, a control memory 85 composed of a plurality of random access memories (RAM), cache memories 83 and 84, and an address array (AA).
81, a copy address array (CAA) 82, and a cross bar consisting of multiple LSI chips.
It is composed of a switch 70.

次に、キャッシュ・メモリ８３．８４および主記憶装置
９１に対する読出しオペレーション動作について説明す
る。先ず、命令あるいはオペランドの読出し指示と読出
しアドレスは命令制御回路１０から結線１０２を介して
アドレス変換制御回路２０へ転送される。上記読出しア
ドレスが仮想アドレスの場合はアドレス変換制御回路２
０内で仮想アドレスから実アドレスに変換される。アド
レス変換制御回路２０は読出し実アドレスを結線２０１
．２０２，２０３，２０４上に出力し、キャッシュ・メ
モリ８３．８４と主記憶袋ｒｆ１９１との対応関係、す
なわちキャッシュ・メモリ８３゜８４の登録情報を記憶
し登録の有無を判定するアドレス・アレイ８１から結線
２０２°を介して返送される信号によりキャッシュ・ヒ
ツト（登録有り）か否かを判定し、キャッシュ・ヒツト
ならばキャッシュ・メモリ８３あるいはキャッシュ・メ
モリ８４の読出しデータを有効としてクロス・バー・ス
イッチ７０を介して読出し先のＬＳＩチップに返送する
。返送先は、−船釣には、命令の読出しの場合は命令制
御回路１０となり、オペランドの読出しの場合は演算制
御回路４０となるが、特殊な動作においてはアドレス変
換制御回路２０や高速演算回路５０となることもある。Next, read operations for the cache memories 83 and 84 and the main storage device 91 will be described. First, an instruction or operand read instruction and a read address are transferred from the instruction control circuit 10 to the address conversion control circuit 20 via the connection 102. If the above read address is a virtual address, address conversion control circuit 2
0, the virtual address is converted to a real address. The address conversion control circuit 20 connects the read real address to the wire 201
．． 202, 203, and 204, and stores the correspondence between the cache memories 83 and 84 and the main memory bag rf191, that is, the registration information of the cache memories 83 and 84, and determines the presence or absence of registration from the address array 81. It is determined whether or not there is a cache hit (registered) based on the signal sent back via the connection 202°, and if it is a cache hit, the read data of the cache memory 83 or cache memory 84 is validated and the cross bar switch is activated. The data is sent back to the read destination LSI chip via 70. - For boat fishing, the return destination is the instruction control circuit 10 in the case of command reading, and the arithmetic control circuit 40 in the case of operand reading, but in special operations, it is sent to the address conversion control circuit 20 or the high-speed arithmetic circuit. It can even be 50.

一方、キャッシュ・ヒツトでない場合（キャッシュ・ミ
スあるいはＮＦＢと呼ばれる。）は、バス制御回路３０
によりシステム・バス９４を介して主記憶装置９１に対
しブロック転送要求を送出する。そして、主記憶装置９
１から返送されるデータは、バス制御回路３０を経た後
、結線３０７．クロス・バー・スイッチ７０．結線８３
７あるいは結線８４７によりキャッシュ・メモリ８３あ
るいはキャッシュ・メモリ８４へ書込まれる。また、主
記憶＠ｊ２ｎ９１からの第１回目の返送データはクロス
・バー・スイッチ７０から返送先へ返送される０以上の
ようにして続出しオペレーションが実行される。On the other hand, if there is no cache hit (called a cache miss or NFB), the bus control circuit 30
A block transfer request is sent to the main storage device 91 via the system bus 94. And the main storage device 9
After passing through the bus control circuit 30, the data returned from the connection 307.1 passes through the bus control circuit 30. Cross bar switch 70. Connection 83
7 or connection 847 to cache memory 83 or cache memory 84. Further, the first return data from the main memory @j2n91 is returned from the cross bar switch 70 to the return destination, and a successive operation is executed as 0 or more.

次に・キャッシュ・メモリ８３．８４および主記憶装置
９１に対する書込みオペレーション動作について説明す
る。先ず、書込み指示と書込みアドレスは命令制御回路
１０で書込みオペレーションを必！とする命令を解読し
た場合あるいはマイクロ・プログラムで書込みオペレー
ションを実行する場合に命令制御回路１０内で作成され
、結線１０２を介してアドレス変換制御回路２０へ送出
される。その書込みアドレスが仮想アドレスの場合には
アドレス変換制御回路２０で実アドレスへ変換された後
、アドレス変換制御回路２０内の書込みアドレスを保持
するレジスタに保持され、高速演算回路５０等で書込み
データが準備された時点で、キャッシュ・メモリ８３あ
るいはキ中ツシユ・メモリ８４への書込みと、主記憶装
置９１に対する書込み指示、書込みアドレス、書込みデ
ータのバス制御回路３０への送出とが実行される。Next, write operations to the cache memories 83 and 84 and the main storage device 91 will be explained. First, the write instruction and write address require a write operation in the instruction control circuit 10! It is created in the instruction control circuit 10 when an instruction for . If the write address is a virtual address, it is converted into a real address by the address conversion control circuit 20, and then held in a register that holds the write address in the address conversion control circuit 20, and the write data is processed by the high-speed arithmetic circuit 50 or the like. At the time of preparation, writing to the cache memory 83 or the internal storage memory 84, and sending a write instruction to the main storage device 91, a write address, and write data to the bus control circuit 30 are executed.

ただし・キャッシュ・メモリ８３あるいはキャッシュ°
メモリ８４への書込みは、１亥当するアドレスがキャッ
シュ・メモリ８３あるいはキャッシュ・メモリ８４に登
録されている場合のみ行われる。However, cache memory 83 or cache °
Writing to memory 84 is performed only when the relevant address is registered in cache memory 83 or cache memory 84.

そして、バス制御回路３０ではシステム・バス９４を介
して主記憶装置９１への書込みを実行する。Then, the bus control circuit 30 executes writing to the main memory device 91 via the system bus 94.

なお、書込みデータは演算制御回路４０において主にマ
イクロ・プログラムの制御下で準備され、結線４０５を
介して高速演算回路５ｏにある書込みデータを保持する
レジスタへ送られた後、書込みアドレスとの同期をとっ
て結ＮｌＡ３０７を介してクロス・バー・スイーフチ７
０へ送られ、バス制御回路３０およびキャッシュ・メモ
リ８３あるいはキャッシュ・メモリ８４へ転送される０
以上のようにして書込みオペレーションが実行される。Note that the write data is prepared in the arithmetic control circuit 40 mainly under the control of the micro program, and after being sent via the connection 405 to the register that holds the write data in the high-speed arithmetic circuit 5o, it is synchronized with the write address. and connect it to the cross bar swift 7 via NlA307.
0 and is transferred to the bus control circuit 30 and cache memory 83 or cache memory 84.
A write operation is executed as described above.

キャッシュ・メモリ８３．８４および主記憶装置９１に
対するデータの読出しオペレーションおよび書込みオペ
レーションは以上のように実行されるものであるが、デ
ータが転送されるデータ線は図示のように全て各回路を
構成するＬＳＩチップ間を１対１で接続するように配設
されてなるものであり、クロス・バー・スイッチ７０に
より選択された結線の他は影響しないと共に、アクセス
・バスの線長が最短になるように各ＬＳＩチップをパン
ケージ上に実装することができるため、パフケージ上の
データ線による遅延時間を大幅に短縮することが可能で
ある。すなわち、従来の装置を第１図の実施例に当ては
めてみると、従来は結線２０７，１０７，４０７，５０
７，３０７，８３７．８４７が並列に接続されたバス構
成となっていたため、トータルの線長が長くなり、静電
容量が増大してデータ転送の際の遅延時間が大きくなっ
てしまっていたが、本発明によればクロス・バー・スイ
ッチ７０により選択された結線のみの静電容量しか関係
してこないと共に最短のアクセス・バスとすることがで
きるため、静電容量に起因する遅延時間を大幅に短縮す
ることができるわけである。Data read and write operations for the cache memories 83 and 84 and the main storage device 91 are executed as described above, but the data lines through which data is transferred all constitute each circuit as shown in the figure. It is arranged so that LSI chips are connected one-to-one, and connections other than those selected by the cross bar switch 70 are not affected, and the line length of the access bus is minimized. Since each LSI chip can be mounted on the puff cage, it is possible to significantly reduce the delay time caused by the data lines on the puff cage. That is, when applying the conventional device to the embodiment shown in FIG.
7,307,837.847 were connected in parallel, which increased the total line length, increased capacitance, and increased delay time during data transfer. According to the present invention, only the capacitance of the connection selected by the cross bar switch 70 is involved, and the access bus can be the shortest, so the delay time caused by capacitance can be significantly reduced. This means that it can be shortened to .

次に、第２図は第１図におけるクロス・バー・スイッチ
７０の内部構成の例を示す構成図である。Next, FIG. 2 is a configuration diagram showing an example of the internal configuration of the cross bar switch 70 in FIG. 1.

第２図において、８４７，８３７，３０７，２０７・　
５０７，４０７，１０７は、第１図において示したよう
に、各々キャッシュ・メモリ８４．キャッシュ・メモリ
８３．バス制御回路３０．アドレス変換制御回路２０．
高速演算回路５０．演算制御回路４０．命令制御回路１
０と接続される結線である。なお、図では簡略化して記
載しであるが（、結線８４７，８３７，３０７，２０７
，５０７．１０７はデータ幅が例えば８バイト（６４ビ
ツト）となっているものである、ただし、結線４０７だ
けはデータ幅が他と異なり、例えば４バイトとなってい
る。しかして、結ｖＡ８４７，８３７゜３０７．２０７
．５０７，４０７，１０７にそれぞれ対応してセレクタ
７１０〜７１６および入出力のドライバが設けられてお
り、クロス・バー・スイッチ７００制御線である結線２
０５としてセレクタ７１０〜７１６のセレクト信号２０
５−３０〜２０５−３６と、ドライバの出力イネーブル
信号２０５−ＥＯ〜２０５−Ｅ４とが与えられ、アドレ
ス変換制御回路２０により個々のセレクタ７１０〜７１
６は独立に制御されるようになっている０例えば、キャ
ッシュ・メモリ８３から命令制御回路１０ヘデータの読
出しを行う場合には、セレクタ７１６により結線１０７
と結線８３７とを接続する。In Figure 2, 847, 837, 307, 207・
507, 407, and 107 are cache memories 84. and 107, respectively, as shown in FIG. Cache memory 83. Bus control circuit 30. Address conversion control circuit 20.
High-speed arithmetic circuit 50. Arithmetic control circuit 40. Command control circuit 1
This is the connection connected to 0. Note that although they are simplified in the figure (the connections 847, 837, 307, 207
, 507.107 have a data width of, for example, 8 bytes (64 bits).However, only the connection 407 has a data width that is different from the others, and is, for example, 4 bytes. Therefore, the conclusion vA847,837°307.207
．． Selectors 710 to 716 and input/output drivers are provided corresponding to 507, 407, and 107, respectively, and connection 2, which is the cross bar switch 700 control line, is provided.
Select signal 20 of selectors 710 to 716 as 05
5-30 to 205-36 and driver output enable signals 205-EO to 205-E4 are applied to the individual selectors 710 to 71 by the address conversion control circuit 20.
For example, when reading data from the cache memory 83 to the instruction control circuit 10, the selector 716 controls the connection 107.
and connection 837.

なお、本発明の直接的な内容ではないが、このクロス・
バー・スイッチ７０はデータ幅を変換する機能も有して
おり、データ幅が均一でないＬＳＩチップ同士を結合す
ることができるようになっている０例えば、演算制御回
路４０（前述したように結線４０７だけはデータ幅が他
と異なり、例えば４バイトである。）へデータの読出し
を実行する場合、キャッシュ・アクセス時はセレクタ７
１５は読出しアドレスに応じて結線８３７または結線８
４７の入力データを選択し、更に読出しアドレスに応じ
８バイト内の上位４バイトあるいは下位４バイトのいず
れかの４バイトを選択するよウニセレクト信号２０５−
３５が与えられることで・８バイト・データを４バイト
・データとして演算制御回路４０に返送することができ
る。なお、他のＬＳＩチップ、例えば命令制御回路１０
へのデータ読出しの際は結線１０７のデータ幅がキャッ
シュ・メモリ８３．８４等と同じ８バイトであるため、
４バイト単位の選択は不要である。Although it is not the direct content of the present invention, this cross
The bar switch 70 also has the function of converting the data width, and can connect LSI chips with non-uniform data widths. The data width is different from the others, for example, 4 bytes.) When accessing the cache, selector 7 is used.
15 is connection 837 or connection 8 depending on the read address.
The uni select signal 205- selects 47 input data and further selects either the upper 4 bytes or the lower 4 bytes within the 8 bytes according to the read address.
By providing 35, 8-byte data can be returned to the arithmetic control circuit 40 as 4-byte data. Note that other LSI chips, such as the instruction control circuit 10
When reading data to, the data width of connection 107 is 8 bytes, which is the same as cache memory 83, 84, etc.
There is no need to select 4-byte units.

次に、第３図は第１図におけるアドレス変換制御回路２
０の内部構成の一部を示したものである。Next, FIG. 3 shows the address conversion control circuit 2 in FIG.
This shows a part of the internal configuration of 0.

第３図において、要求コードは命令制御回路１０から与
えられる続出しオペレーシヨンあるいは書込みオペレー
シヨン等を指示する情報が含まれたコードであり、要求
アドレスは命令制御回路１０から与えられる読出し、書
込みアドレス（命令制御回路１０から与えられる読出し
、書込みアドレスが仮想アドレスである場合は実アドレ
スを変換された後のもの）である。In FIG. 3, the request code is a code containing information that instructs a continuation operation or a write operation given from the instruction control circuit 10, and the request address is a read or write address given from the instruction control circuit 10. (If the read/write address given from the instruction control circuit 10 is a virtual address, it is after the real address has been converted).

以下、動作を説明する。先ず、結＆’Ｊ２０−１０１お
よび結線２０−２０１に要求コードおよび要求アドレス
が与えられると、要求コードは要求コード・レジスタ２
０−１０にセントされ、要求アドレスは実アドレス・レ
ジスタ２０−２０にセットされる０通常状態では要求受
付時に実アドレス・レジスタ２０−２０に要求アドレス
がセントされると同時に、ＡＡアドレス・レジスタ２０
−３０と、ＤＡアドレス・レジスタ２０−４０あるいは
ＤＡアドレス・レジスタ２０−４１にも要求アドレスの
一部がセントされる。読出しまたは書込みオペレージジ
ン時はＡＡアドレス・レジスタ２０−３０．ＤＡアドレ
ス・レジスタ２０−４０゜２０−４１から結線２０２〜
２０４にアドレスが与えられてアドレス・アレイ８１と
キャッシュ・メモリ８３またはキャッシュ・メモリ８４
とが読出され、アドレス・アレイ８１でキャッシュ・ヒ
ントか否かが調べられる。そして、読出しオペレーショ
ンの場合は、キャッシュ・ヒントならばキャッシュ・メ
モリ８３またはキャッシュ・メモリ８４から読出したデ
ータはクロス・バー・スイッチ７０を介して読出し先へ
返送される。なお、キャッシュ・メモリ８３かキャッシ
ュ・メモリ８４のいずれから読出しデータを返送するか
は要求アドレス中の予め決められた１ビツトの値に従っ
て行われ、このビットの値が“０”の時にキャッシュ・
メモリ８３　（バンク＃０）が選択され、“１“の時に
キャッシュ・メモリ８４　（バンク＃１）が選択される
。一方、キャッシュ・ヒントでない場合（キャッシュ・
ミスの場合）、実アドレス・レジスタ２０−２０からセ
レクタ２０−２３を介して結線２０１によりバス制御回
路３０へ主記憶袋Ｗ１９１に対するブロック転送のアド
レスが送出され、バス制御回路３０で読出されたブロッ
ク転送データの第１回の返送時、そのデータはクロス・
バー・スイッチ７０を介して読出し先に返送されると同
時にキャッシュ・メモリ８３またはキャッシュ・メモリ
８４へ登録される。なお、ブロック・サイズを３２バイ
ト、データの転送幅を８バイトとすると、ブロック転送
は８バイト転送を４回実行することになる。また、キャ
ッシュ・メモリ８３．８４のバンクをアドレスの下位か
ら第５ビツト目、すなわち１６バイト境界で分けること
とすると、ブロック転送データはキャッシュ・メモリ８
３とキャッシュ・メモリ８４へ２回ずつ（１６バイトず
つ）！込まれることになる。The operation will be explained below. First, when a request code and a request address are given to connection &'J20-101 and connection 20-201, the request code is stored in request code register 2.
0-10, and the requested address is set in the real address registers 20-20.0 In normal conditions, when a request is accepted, the requested address is written in the real address registers 20-20, and at the same time, the requested address is set in the AA address register 20.
-30 and a part of the requested address is also sent to the DA address registers 20-40 or 20-41. During a read or write operation, AA address registers 20-30. DA address register 20-40゜20-41 to connection 202~
Addresses are given to address array 81 and cache memory 83 or cache memory 84.
is read out and checked in address array 81 to see if it is a cache hint. In the case of a read operation, if it is a cache hint, the data read from cache memory 83 or cache memory 84 is returned to the read destination via cross bar switch 70. Note that whether the read data is returned from the cache memory 83 or the cache memory 84 is determined according to the value of a predetermined 1 bit in the request address, and when the value of this bit is "0", the read data is returned from the cache memory 83 or the cache memory 84.
Memory 83 (bank #0) is selected, and when it is "1", cache memory 84 (bank #1) is selected. On the other hand, if it is not a cache hint (cache
In the case of a miss), the address for block transfer to the main memory bag W191 is sent from the real address register 20-20 via the selector 20-23 to the bus control circuit 30 via the connection 201, and the block read out by the bus control circuit 30 is sent. When the transferred data is returned for the first time, the data is
The data is sent back to the reading destination via the bar switch 70 and simultaneously registered in the cache memory 83 or cache memory 84. Note that, assuming that the block size is 32 bytes and the data transfer width is 8 bytes, the block transfer is performed by performing 8-byte transfer four times. Furthermore, if we divide the banks of cache memory 83 and 84 at the 5th bit from the bottom of the address, that is, at the 16-byte boundary, the block transfer data will be divided into cache memory 83 and 84 banks.
3 and cache memory 84 twice (16 bytes each)! You will be trapped.

一方、要求コード・レジスタ２０−１０に書込みオペレ
ーショｌの指示がセットされた場合は、アドレス・アレ
イ８１の参照とキャッシュ・メモリ８３あるいはキャッ
シュ・メモリ８４の読出しとが実行された後、要求アド
レス（書込みアドレス）は実アドレス・レジスタ２（１
−２０から実アドレス・レジスタ２０−２２にセットさ
れ、キャッシュ・メモリ８３あるいはキャッシュ・メモ
リ８４の読出しデータはデータ・レジスタ２０−５０ヘ
セソトされる。また、キャッシュ・ヒツトか否かの情報
はデコーダ２０−１１に入力され、要求コード・レジス
タ２０−１２ヘセツトされる。On the other hand, when the instruction for write operation I is set in the request code register 20-10, the request address ( write address) is real address register 2 (1
-20 to real address registers 20-22, and read data from cache memory 83 or cache memory 84 is transferred to data registers 20-50. Further, information as to whether or not it is a cache hit is input to the decoder 20-11 and set in the request code register 20-12.

このように書込みオペレーションの場合は、要求コード
・レジスタ２０−１０．実アドレス・レジスタ２０−２
０の第１ステージから要求コード・レジスタ２０〜１２
．実アドレス・レジスタ２０−２２の第２ステージに処
理を移行させ、第１ステージを空けることにより、後続
の要求を受付けることができるようになっている。すな
わち、書込みオペレーションでは書込みデータを待ち合
わせる必要から、このような処理が可能となる・さて、
第２ステージの要求コード・レジスタ２０−１２．実ア
ドレス・レジスタ２０−２２にセットされた書込みオペ
レーションの要求コード。Thus, for write operations, request code registers 20-10. Real address register 20-2
Request code registers 20-12 from the first stage of 0
．． By moving the processing to the second stage of the real address registers 20-22 and leaving the first stage vacant, subsequent requests can be accepted. In other words, in a write operation, it is necessary to wait for the write data, so this kind of processing is possible.
Second stage request code register 20-12. Request code for write operation set in real address registers 20-22.

要求アドレスは、高速演算回路５０内の書込みデータ・
レジスタに書込みデータが準備されるのを待ち合わせ、
書込みデータが準備された時点で書込み動作を行う、な
お、本発明の直接的な内容ではないが、この実施例では
キャッシュ・ヒントの場合は書込みに際してデータ幅内
の全てのデータ（例えば８バイト）を書換えない部分書
込みであっても、データ幅内の全てのデータを書換える
全書込みとし、特に主記憶装置９１への書込みにかかる
処理速度の向上を図れるようになっている。The requested address is the write data in the high-speed arithmetic circuit 50.
Wait for the write data to be prepared in the register,
A write operation is performed when write data is prepared.Although this is not a direct content of the present invention, in this embodiment, in the case of a cache hint, all data within the data width (e.g. 8 bytes) is written. Even if it is a partial write that does not rewrite the data, it is a full write that rewrites all the data within the data width, so that the processing speed particularly for writing to the main storage device 91 can be improved.

すなわち、アドレス・アレイ８１の参照とキャッシュ・
メモリ８３あるいはキャッシュ・メモリ８４の読出しと
が実行された状態で、キャッシュ・メモリ８３あるいは
キャッシュ・メモリ８４の読出しデータは結線２０７を
介しデ＝り・レジスタ２０−５０に保持されるようにな
っており、書込みデータが準備された場合に、高速演算
回路５゜から結線５０７を介して転送される書込みデー
タと、アドレス変換制御回路２０のデータ・レジスタ２
０−５０からセレクタ２０−５１および結線２０７を介
して転送される書込み前データとをクロス・バー・スイ
ッチ７０で受け、バイト単位でデータの入換えを行い、
新たな書込みデータを作成するようになっている。つま
り、バイト単位に書込みマスク（データ幅が８バイトの
場合は８ビツト）が設けられており、そのマスクが“１
”のバイトのみが書込み前データと入換えられるように
なっている。すなわち、書込みマスクが１”のバイトで
は結線Ｆｉ０７の書込みデータを選択し、書込みマスク
が“０”のバイトでは結線２０７の書込み前データを選
択する。なお、この書込みマスクは書込みデータととも
に結線５０７でクロス・バー・スイッチ７０に送出され
るものであり、書込みマスク受入部７２０で受信された
後、結線２０５による制御信号と同様にセレクタの制御
に使用される。この操作によりキャッシュ・ヒツト時は
、全書込みでない書込みオペレーションに対してもバス
制御回路３０および主記憶装置９１に対して全書込みと
することが可能である。すなわち、全書込み化が可能と
なる。なお、キャッシュ・ヒツトの場合はデータ・レジ
スタ２０−５０の内容は書込み前データとなるため、上
記のような処理が可能であるが、キャッシュ・ミスの場
合は内容は不定（パリティのみ保障される。）であるた
め、全書込み化は行えない、このようなキャッシュ・ミ
スの場合は全書込み化は不可能であるので、２バイト書
込みならそのまま２バイト部分書込みとしてバス制御回
路３０へ送出され、キャッシュ・メモリ８３．８４への
書込みも実行しない。In other words, referring to address array 81 and cache
In a state where reading from the memory 83 or cache memory 84 is executed, the read data from the cache memory 83 or cache memory 84 is held in the data register 20-50 via the connection 207. When the write data is prepared, the write data transferred from the high-speed arithmetic circuit 5 through the connection 507 and the data register 2 of the address conversion control circuit 20 are transferred.
The cross bar switch 70 receives the pre-write data transferred from 0-50 through the selector 20-51 and the connection 207, and exchanges the data in byte units.
New write data is created. In other words, a write mask (8 bits if the data width is 8 bytes) is provided for each byte, and the mask is “1”.
” byte is replaced with pre-write data. In other words, byte with write mask 1 selects the write data of connection Fi07, and byte with write mask “0” selects the write data of connection 207. Select previous data. Note that this write mask is sent to the cross bar switch 70 through a connection 507 together with the write data, and after being received by the write mask receiving section 720, it is used to control the selector in the same way as the control signal through the connection 205. be done. By this operation, when a cache hit occurs, it is possible to perform a full write to the bus control circuit 30 and the main storage device 91 even for a write operation that is not a full write. In other words, full writing becomes possible. Note that in the case of a cache hit, the contents of data registers 20-50 are pre-write data, so the above processing is possible; however, in the case of a cache miss, the contents are undefined (only parity is guaranteed). ), full writing cannot be performed.In the case of such a cache miss, full writing is impossible.If it is a 2-byte write, it is directly sent to the bus control circuit 30 as a 2-byte partial write, Writing to cache memories 83 and 84 is also not executed.

また、一般に主記憶装置９１では８バイト単位にエラー
訂正符号（ＥＣＣ）を有し、読出し１ビ。Generally, the main storage device 91 has an error correction code (ECC) in units of 8 bytes, and reads 1 bit.

ト・エラーを訂正するようにしているため、例えば２バ
イト部分書込み等の８バイト全書込み以外の書込み実行
時は、対応する８バイト境界データの読出しを行った後
、書込みデータの２バイトのみを差し換えて８バイト単
位にエラー訂正符号を再作成してデータとともに書込む
ことが必要であり、全書込みに比べ処理時間が大きくな
ってしまうことが考えられるが、その場合は、この処理
時間の遅れを救済するため、演算処理装置９ｏ内のキャ
ッシュ・メモリ８３．８４で上記の処理を予め実行し、
主記憶装置９１に対しては全書込み動作として主記憶装
置９１の処理時間を短縮することが可能である。For example, when writing other than full 8-byte writing, such as partial 2-byte writing, after reading the corresponding 8-byte boundary data, only 2 bytes of the written data are corrected. It is necessary to replace the error correction code in 8-byte units and write it together with the data, which may result in a longer processing time compared to writing the whole thing, but in that case, the processing time will be delayed. In order to relieve
It is possible to shorten the processing time of the main memory device 91 by performing a full write operation on the main memory device 91.

次に、本発明の他の特徴であるＬＳＩチ、プとキャッシ
ュ・メモリとの間で同時にデータ転送を行う動作につい
て説明する。すなわち、第３図においては要求コード・
レジスタおよび実アドレス・レジスタが２つのステージ
となっており、２個のバンクに分割されたキャッシュ・
メモリ８３゜８４に対して同時に書込み、読出しが行え
るようになっている。以下、第２ステージの要求コード
・レジスタ２０−１２．実アドレス・レジスタ２０−２
２に書込みオペレーションがセットされ、第１ステージ
の要求コード・レジスタ２０−１０゜実アドレス・レジ
スタ２０−２０に続出しオペレーションがセットされて
いる場合について動作を説明する。なお、この場合、書
込み、読出しを行うキャッシュ・メモリのバンクによっ
て動作が異なる。なお、バンクの選択は前述したように
要求アドレス中の予め決められた１ビツトの値に従って
行われる。Next, the operation of simultaneously transferring data between the LSI chip and the cache memory, which is another feature of the present invention, will be explained. In other words, in Figure 3, the request code
The register and real address register are in two stages, and the cache is divided into two banks.
Writing and reading can be performed simultaneously to the memories 83 and 84. Below, second stage request code register 20-12. Real address register 20-2
The operation will be described in the case where the write operation is set to 2 and the subsequent read operation is set to the request code register 20-10 and the real address register 20-20 of the first stage. In this case, the operation differs depending on the bank of the cache memory to which writing and reading are performed. Note that bank selection is performed according to the predetermined value of one bit in the request address, as described above.

＋１１同一バンクの場合この場合は第２ステージの書込みオペレーションが優先
され、ＤＡアドレス・レジスタ２０−４０またはＤＡア
ドレス・レジスタ２０−４１には書込みアドレス（実ア
ドレス・レジスタ２０−２２の内容）の一部がセレクタ
２０−２３．２０−４２．２０−４３を介してセットさ
れ、キャッシュ・メモリ８３あるいはキャッシュ・メモ
リ８４への書込みアドレスを確保し、書込みが行われる
。+11 In the case of the same bank In this case, the second stage write operation takes priority, and one of the write addresses (contents of real address registers 20-22) is stored in DA address registers 20-40 or DA address registers 20-41. is set via the selector 20-23.20-42.20-43, a write address to the cache memory 83 or cache memory 84 is secured, and writing is performed.

また、第１ステージの読出しオペレーションは書込みオ
ペレーションが終了するのを待ち合わせて行われる。Further, the first stage read operation is performed while waiting for the write operation to be completed.

（２）別バンクの場合この場合、例えば書込みがバンク＃Ｏ（キャッシュ・メ
モリ８３）で読出しがバンク＃ｌ　（キャッシュ・メモ
リ８４）の場合、書込みアドレスの一部はＤＡアドレス
・レジスタ２０−４０に、読出しアドレスの一部はＡＡ
アドレス・レジスタ２０−３０およびＤＡアドレス・レ
ジスタ２０−４１にセントされる。従って、第２ステー
ジではＤＡアドレス・レジスタ２０−４０によりキャシ
ュ・メモリ８３のアドレスを確保し、結線５０７゜２０
７により書込みデータを作成し、結線８３７によりキャ
ッシュ・メモリ８３ヘデータを書込むと同時に、結線３
０７によりバス制御回路３０へ書込みデータを送出して
主記憶装置９１への書込みを行う、これと並列して、第
１ステージではＡＡアドレス・レジスタ２０−３０とＤ
へアドレス・レジスタ２０−４１とによりアドレス・ア
レイ８１とキャッシュ・メモリ８４のアドレスを確保し
、キャッシュ・メモリ８４のデータを結線８４７により
読み出す、この時、続出し先が命令制御回路１０または
演算制御回路４０ならば上記の読出しデータを返送する
ことが可能である。ただし、高速演算回路５０またはア
ドレス変換制御回路２０は第２ステージの書込みオペレ
ーションにより使用されているため、これらへの読出し
は不可である。(2) In the case of another bank In this case, for example, if writing is to bank #O (cache memory 83) and reading is to bank #l (cache memory 84), part of the write address is stored in the DA address registers 20-40. Part of the read address is AA
Address registers 20-30 and DA address registers 20-41. Therefore, in the second stage, the address of the cache memory 83 is secured by the DA address register 20-40, and the connection 507°20
7 creates write data, and at the same time the data is written to the cache memory 83 through connection 837, connection 3
07, the write data is sent to the bus control circuit 30 and written to the main memory device 91. In parallel, in the first stage, the AA address registers 20-30 and D
The addresses of the address array 81 and the cache memory 84 are secured by the address registers 20-41, and the data in the cache memory 84 is read through the connection 847. At this time, the subsequent destination is the instruction control circuit 10 or the arithmetic control circuit. The circuit 40 can return the above read data. However, since the high-speed arithmetic circuit 50 or the address conversion control circuit 20 is used for the second stage write operation, reading to these is not possible.

〔Effect of the invention〕

以上説明したように、本発明の演算処理装置にあっては
、キャッシュ・メモリの読出し、８込みのためのデータ
・パスにバス方式を使わずにクロス・バー・スイッチ機
能を有するチップで直接にキャッシュ・メモリとＬＳＩ
チップとの接続を行うようにしているため、データ転送
の行われるデータ・パスを形成するトータルの線長を最
短にすることが可能となり、高速なキャッシュ・メモリ
のアクセスを実現することができる効果がある。As explained above, in the arithmetic processing device of the present invention, the data path for reading and 8-input of the cache memory can be directly implemented using a chip having a cross bar switch function without using a bus method. Cache memory and LSI
Since it is connected to the chip, it is possible to minimize the total line length that forms the data path where data is transferred, which has the effect of realizing high-speed cache memory access. There is.

また、キャッシュ・メモリを２個以上のバンクに分割し
、クロス・バー・スイッチ機能を有するチップを介して
異なるＬＳＩチップと同時にデータ転送が行えるため、
データ転送の効率を大幅に向上させることができる効果
がある。In addition, the cache memory is divided into two or more banks, and data can be transferred simultaneously to different LSI chips via a chip with a cross bar switch function.
This has the effect of greatly improving data transfer efficiency.

[Brief explanation of the drawing]

第１図は本発明の演算処理装置を含む情報処理装置の構
成図、第２図は第１図におけるクロス・バー・スイッチの内部
構成図および、第３図は第１図におけるアドレス変換制御回路の内部構
成の一部を示す図である。図において、９０・・・演算処理装置、９１・・・主記
憶装置、９２・・・入出力制御装置、９３・・・システ
ム制御装置、９４・・・システム・パス、１０・・・命
令制御回路、２０・・・アドレス変換制御回路、３０・
・・バス制御回路、４０・・・演算制御回路、５０・・
・高速演算回路、６０・・・制御記憶回路、７０・・・
クロス・バー・スイッチ、８１・・・アドレス・アレイ
、８２・・・コピー・アドレス・アレイ、８３．８４・
・・キャッシュ・メモリ、８５・・・制御記憶。FIG. 1 is a configuration diagram of an information processing device including an arithmetic processing device of the present invention, FIG. 2 is an internal configuration diagram of the cross bar switch in FIG. 1, and FIG. 3 is an address conversion control circuit in FIG. 1. FIG. 2 is a diagram showing a part of the internal configuration of. In the figure, 90... Arithmetic processing unit, 91... Main storage device, 92... Input/output control device, 93... System control device, 94... System path, 10... Instruction control Circuit, 20...Address conversion control circuit, 30.
...Bus control circuit, 40...Arithmetic control circuit, 50...
- High-speed calculation circuit, 60... Control storage circuit, 70...
Cross bar switch, 81...Address array, 82...Copy address array, 83.84.
... Cache memory, 85... Control memory.

Claims

[Scope of Claims] An arithmetic processing device comprising a cache memory and a plurality of LSI chips, in which data transfer is performed between the cache memory and two or more of the LSI chips, comprising: A cross bar that can divide the bank into two or more banks and connect any input/output terminals.
The cache is connected via a chip with a switch function.
A memory and two or more of the LSI chips are connected, and two or more different LSI chips and two or more different banks of the cache memory are connected via the chip having the cross bar switch function. An arithmetic processing unit characterized by simultaneously transferring data.