JPH0357497B2

JPH0357497B2 -

Info

Publication number: JPH0357497B2
Application number: JP11191685A
Authority: JP
Inventors: Makoto Suwada
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1985-05-24
Filing date: 1985-05-24
Publication date: 1991-09-02
Also published as: JPS61269775A

Description

【発明の詳細な説明】（産業上の利用分野）本発明はベクトル処理装置におけるベクトルレ
ジスタ間のデータ移送処理に関し、特にベクトル
エレメントの置換に関する。DETAILED DESCRIPTION OF THE INVENTION (Field of Industrial Application) The present invention relates to data transfer processing between vector registers in a vector processing device, and particularly to replacement of vector elements.

（従来の技術）ベクトルまたは行列の演算を行う場合、部分ベ
クトルや部分行列の概念は非常に重要であり、科
学演算を行う場合に頻繁に発生する。(Prior Art) When performing vector or matrix operations, the concepts of subvectors and submatrices are very important and occur frequently when performing scientific operations.

従来、この種のベクトル処理装置では演算用の
ベクトルレジスタ間でベクトルエレメントを移送
する機能が具備されていなかつたため、部分ベク
トルや部分行列の置換や重ね合せの動作はメモリ
を介してストア動作およびロード動作を組合せて
行つていた。 Conventionally, this type of vector processing device did not have a function to transfer vector elements between vector registers for calculations, so operations such as permutation and superposition of partial vectors and submatrices were performed using store operations and loads via memory. They were doing a combination of movements.

（発明が解決しようとする問題点）上述した従来のベクトル処理装置ではメモリを
介して置換が行われるため、メモリの入出力に関
するオーバヘツドが大きく、リザルトベクトルレ
ジスタ上に必要なデータを得るまでに時間がかか
り、ベクトルデータの置換と演算とを連続的に行
う場合にはメモリのデータ供給性能により演算が
大幅に待たされると云う欠点があつた。(Problems to be Solved by the Invention) In the conventional vector processing device described above, replacement is performed via memory, so there is a large overhead related to memory input/output, and it takes a long time to obtain the necessary data on the result vector register. However, when vector data replacement and calculation are performed continuously, the calculation has to be delayed for a long time due to the data supply performance of the memory.

本発明の目的は、ベクトルレジスタを使用した
ベクトル処理装置に比較的簡単な回路を付加する
ことにより上記欠点を除去し、ベクトル演算を高
速で実行できるように構成したベクトル処理装置
を提供することができる。 An object of the present invention is to eliminate the above-mentioned drawbacks by adding a relatively simple circuit to a vector processing device using vector registers, and to provide a vector processing device configured to be able to execute vector operations at high speed. can.

（問題点を解決するための手段）本発明によるベクトル処理装置はｎ個（１より
大きい正の整数）と、ｎ個のリザルトベクトルレ
ジスタと、アライン回路と、第１のmod ｎ回路
と、第１のデコーダ回路と、第２のデコーダ回路
と、第２〜第４のmod ｎ回路とを具備し、ｎ個
のオペランドベクトルレジスタからｎ個のリザル
トベクトルレジスタにベクトルエレメントを並列
的に移送することができるように構成したもので
ある。(Means for Solving the Problems) A vector processing device according to the present invention includes n (a positive integer greater than 1), n result vector registers, an align circuit, a first mod n circuit, and a first mod n circuit. The present invention includes a first decoder circuit, a second decoder circuit, and second to fourth mod n circuits, and transfers vector elements from n operand vector registers to n result vector registers in parallel. It is configured so that it can be done.

ｎ個のオペランドベクトルレジスタは、同一ベ
クトルに属するｎ個のベクトルエレメントを１サ
イクル中に読出すためのものである。 The n operand vector registers are for reading n vector elements belonging to the same vector in one cycle.

ｎ個のリザルトベクトルレジスタは、同一ベク
トルに属するｎ個のベクトルエレメントを１サイ
クル中に書込むためのものである。 The n result vector registers are for writing n vector elements belonging to the same vector in one cycle.

アライン回路は、オペランドベクトルレジスタ
から読出されたｎ個のベクトルエレメントに対応
して設けられたｎ個の読出しデータパスを、それ
ぞれリザルトベクトルレジスタに書込まれるｎ個
のベクトルエレメントに対応して設けられたｎ個
の書込みデータパスに選択的に接続するためのも
のである。 The align circuit connects n read data paths provided corresponding to n vector elements read from the operand vector register to n read data paths provided corresponding to n vector elements written to the result vector register, respectively. and n write data paths.

第１のmod ｎ回路は、オペランドベクトルレ
ジスタに格納されたベクトルデータのうち、移送
元の先頭ベクトルエレメントを示すアドレスデー
タＡ×ｎ＋ｉ（０Ａ，０ｉ＜ｎ）をｎで除算
した剰余ｉ、および／またはオペランドベクトル
レジスタからリザルトベクトルレジスタへ移送さ
れるベクトルデータのうち、移送先の先頭ベクト
ルエレメントを示すアドレスデータＢ×ｎ＋ｊ
（０Ｂ，０ｊ＜ｎ）をｎで除算した剰余ｊを
検出するためのものである。 The first mod n circuit calculates the remainder i obtained by dividing address data A×n+i (0A, 0i<n) indicating the first vector element of the transfer source by n, among the vector data stored in the operand vector register, and / Or, among the vector data transferred from the operand vector register to the result vector register, address data B×n+j indicating the first vector element to be transferred.
This is to detect the remainder j obtained by dividing (0B, 0j<n) by n.

第１のデコーダ回路は、第１のmod ｎ回路よ
り検出されたｉの値により０番目からｉ−１番目
のオペランドベクトルレジスタに対してはＡ＋１
の読出しアドレスを与えるとともに、ｉ番目から
ｎ−１番目のオペランドベクトルレジスタに対し
てはＡの読出しアドレスを与えるためのものであ
る。 The first decoder circuit outputs A+1 for the 0th to i-1th operand vector registers according to the value of i detected by the first mod n circuit.
This is to give the read address of A, and also give the read address of A to the i-th to (n-1)th operand vector registers.

第２のデコーダ回路は、第１のmod ｎ回路よ
り検出されたｊの値により０番目からｊ−１番目
のリザルトベクトルレジスタに対してはＢ＋１の
書込みアドレスを与えるとともに、ｊ番目からｎ
−１番目のリザルトベクトルレジスタに対しては
Ｂの書込みアドレスを与えるためのものである。 The second decoder circuit gives a write address of B+1 to the 0th to j-1th result vector registers based on the value of j detected by the first mod n circuit, and also gives the write address of B+1 to the result vector registers from the jth to nth.
-The first result vector register is for giving the write address of B.

第２〜第４のmod ｎ回路は、ｋ番目（０ｋ
＜ｎ）の書込みデータパスに対応した選択に対し
て、それぞれｋ＋ｉ，ｋ−ｊ、ならびにｋ＋ｉ−
ｊをｎで除算した剰余をl_k（０l_k＜ｎ）として求
め、アライン回路に選択信号として与えるための
ものである。 The second to fourth mod n circuits are k-th (0k
k+i, k−j, and k+i−, respectively, for selections corresponding to write data paths <n).
The remainder obtained by dividing j by n is determined as l _k (0l _k <n) and is provided to the align circuit as a selection signal.

（実施例）次に、本発明において図面を参照して説明す
る。(Example) Next, the present invention will be described with reference to the drawings.

第１図〜第３図は、本発明によるベクトル処理
装置の一実施例を示すブロツク図である。第１図
〜第３図において、１〜４はそれぞれ第０〜第３
のオペランドベクトルレジスタ、５〜８はそれぞ
れ第０〜第３のベクトルレジスタ、９はアライン
回路、１０，１４はそれぞれデコーダ、１１〜１
３，１５はそれぞれ第１〜第４のmod4回路であ
る。 1 to 3 are block diagrams showing one embodiment of a vector processing device according to the present invention. In Figures 1 to 3, 1 to 4 are 0 to 3, respectively.
, 5 to 8 are the 0th to 3rd vector registers, 9 is an align circuit, 10 and 14 are decoders, and 11 to 1 are operand vector registers.
3 and 15 are first to fourth mod4 circuits, respectively.

第１図はｎ＝４とし、読出し開始アドレスのみ
を指定可能とし、書込み開始アドレスは常に
“０”であるシステムに関する動作を説明するブ
ロツク図である。第２図は、逆に書込み開始アド
レスのみを指定可能とし、読出し開始アドレスは
常に“０”であるシステムに関する動作を説明す
るブロツク図である。第３図は、読出し開始アド
レス、および書込み開始アドレスをともに指定可
能としたシステムに関する動作を説明するブロツ
ク図である。 FIG. 1 is a block diagram illustrating the operation of a system in which n=4, only the read start address can be specified, and the write start address is always "0". FIG. 2 is a block diagram illustrating the operation of a system in which only the write start address can be specified and the read start address is always "0". FIG. 3 is a block diagram illustrating the operation of a system in which both a read start address and a write start address can be specified.

第１図において、第０〜第３のオペランドベク
トルレジスタ１〜４の出力は各読出しデータパス
１００〜１０３を介してアライン回路９に入力
し、さらにアライン回路９の内部で選択的に並べ
換えられた出力データは、各書込みデータパス２
００〜２０３を介してて第０〜第３のリザルトベ
クトルレジスタ５〜８に接続される。また、オペ
ランドベクトルレジスタ１〜４に対して指定され
た読出し開始アドレス（Ａ×ｎ＋ｉ）は、第１の
mod4回路１２に入力されて４で除算された剰余
ｉを出力し、第２のmod4回路１１とデコーダ回
路１０とに分配される。mod4回路１１では各書
込みデータパス２００〜２０３に対応して、０＋
ｉ，１＋ｉ，２＋ｉ，３＋ｉをそれぞれｎで除算
し、剰余l₀，l₁，l₂，l₃をそれぞれ選択信号として
アライン回路９に供給する。デコーダ回路９は読
出し開始アドレスＡ×ｎ＋ｉとその剰余ｉとを入
力し、第０〜第３のオペランドベクトルレジスタ
１〜４に対し、ＡまたはＡ＋１のアドレスをそれ
ぞれ供給する。 In FIG. 1, the outputs of the 0th to 3rd operand vector registers 1 to 4 are input to the align circuit 9 via each read data path 100 to 103, and are further selectively rearranged within the align circuit 9. Output data is output from each write data path 2.
It is connected to the 0th to 3rd result vector registers 5 to 8 via 00 to 203. Also, the read start address (A×n+i) specified for operand vector registers 1 to 4 is
The remainder i which is inputted to the mod4 circuit 12 and divided by 4 is output and distributed to the second mod4 circuit 11 and the decoder circuit 10. In the mod4 circuit 11, 0+
i, 1+i, 2+i, and 3+i are each divided by n, and the remainders l ₀ , l ₁ , l ₂ , and l ₃ are respectively supplied to the align circuit 9 as selection signals. The decoder circuit 9 inputs the read start address A×n+i and its remainder i, and supplies the address of A or A+1 to the 0th to third operand vector registers 1 to 4, respectively.

第１図に示す実施例では、ｎ＝４およびＡ×ｎ
＋ｉ＝17を与えており、Ａ＝４であつてｉ＝４で
ある。また、l₀＝１，l₁＝２，l₂＝３，l₃＝０であ
つて、第０のオペランドベクトルレジスタ１に対
して供給される読出しアドレスは５、第１〜第３
のオペランドベクトルレジスタ２〜４に対して供
給される読出しアドレスは４である。 In the embodiment shown in FIG. 1, n=4 and A×n
+i=17, A=4, and i=4. Also, l ₀ = 1, l ₁ = 2, l ₂ = 3, l ₃ = 0, and the read address supplied to the 0th operand vector register 1 is 5, and the first to third
The read address supplied to operand vector registers 2 to 4 is 4.

いつぽう、第０〜第３のリザルトベクトルレジ
スタ５〜８に対して与えられる書込み開始アドレ
スは常に０であるので、第１サイクルでは第１図
の斜線で示すエレメントが移送され、第０〜第３
のオペランドベクトルレジスタ１〜４に格納され
ていたベクトルエレメント＃17〜＃20は第０〜第
３のリザルトベクトルレジスタ５〜８の第０〜第
３のベクトルエレメントとして格納される。従つ
て、以下に順次、各サイクルごとに各オペランド
ベクトルレジスタ１〜４の読出しアドレスと、各
リザルトベクトルレジスタ５〜８の書込みアドレ
スとを＋１ずつ進ませてゆくことにより、第４図
に示すようなベクトルエレメントの移送が行われ
ることになる。本動作により行われる移送は、部
分ベクトルの抽出における置換に対応する。 Since the write start address given to the 0th to 3rd result vector registers 5 to 8 is always 0, in the first cycle the elements indicated by diagonal lines in FIG. 3
The vector elements #17 to #20 stored in the operand vector registers 1 to 4 are stored as the 0th to 3rd vector elements of the 0th to 3rd result vector registers 5 to 8. Therefore, by sequentially incrementing the read address of each operand vector register 1 to 4 and the write address of each result vector register 5 to 8 by +1 in each cycle, as shown in FIG. A vector element transfer will be performed. The transfer performed by this operation corresponds to replacement in partial vector extraction.

第２図に示す構成は、第１図の構成に対して次
の各点で相違がある。すなわち、第１図のデコー
ダ回路１０の出力がオペランドベクトルレジスタ
ではなく、リザルトベクトルレジスタに供給され
ており、第２のmod4回路１１の代りに第３の
mod4回路１３に設置されていることである。第
２のmod4回路１１と第３のmod4回路１３との相
違は、第２のmod4回路１１が各書込みデータパ
ス２００〜２０３に対応してそれぞれ０＋ｉ，１
＋ｉ，２＋ｉ，３＋ｉををｎで除算した剰余を選
択信号として供給しているのに対し、第３の
mod4回路１３はそれぞれ０−ｊ，１−ｊ，２−
ｊ，３−ｊをｎで除算した剰余を供給しているこ
とにある。 The configuration shown in FIG. 2 differs from the configuration shown in FIG. 1 in the following points. That is, the output of the decoder circuit 10 in FIG.
It is installed in the mod4 circuit 13. The difference between the second mod4 circuit 11 and the third mod4 circuit 13 is that the second mod4 circuit 11 corresponds to each write data path 200 to 203 by 0+i and 1, respectively.
+i, 2+i, 3+i divided by n is supplied as the selection signal, whereas the third
The mod4 circuits 13 are respectively 0-j, 1-j, 2-
The purpose is to supply the remainder obtained by dividing j, 3-j by n.

第２図の実施例ではｎ＝４ならびにＢ×ｎ＋ｊ
＝13を与えており、Ｂ＝３ならびにｊ＝１が得ら
れる。従つて、l₀＝３，l₁＝０，l₂＝１，l₃＝２と
なり、第０のリザルトベクトルレジスタ５には書
込み開始アドレスとして４が供給され、第１〜第
３のリザルトベクトルレジスタ６〜８には３が供
給される。斯くして、第１サイクルでは第２図の
斜線で示すエレメントが移送され、オペランドベ
クトルレジスタ１〜４の内部の第０〜第３のエレ
メントがリザルトベクトルレジスタ５〜８の第13
〜第16のエレメントとして格納される。以下に順
次、オペランドベクトルレジスタ１〜４とリザル
トベクトルレジスタ５〜８とのアドレスを進ませ
ることにより、第５図に示すようなベクトルエレ
メントの移送が行われることになる。上記の動作
により行われる移送は、ベクトルの合成における
置換に対応する。 In the embodiment of FIG. 2, n=4 and B×n+j
=13, and B=3 and j=1 are obtained. Therefore, l ₀ = 3, l ₁ = 0, l ₂ = 1, l ₃ = 2, and 4 is supplied to the 0th result vector register 5 as the write start address, and the first to third result vectors are 3 is supplied to registers 6-8. Thus, in the first cycle, the elements indicated by diagonal lines in FIG.
~ Stored as the 16th element. By sequentially advancing the addresses of operand vector registers 1-4 and result vector registers 5-8, vector elements are transferred as shown in FIG. The transfer performed by the above operation corresponds to a permutation in the composition of vectors.

第３図に示す構成は第１図および第２図に示す
機能を合成させたもので、第２のmod4回路１１
または第３のmod4回路１３が第４のmod4回路１
５となつている点のみが異なる。第２のmod4回
路１１が各書込みデータパス（０〜３）に対応し
て０＋ｉ，１＋ｉ，２＋ｉ，３＋ｉをｎで除算し
た剰余を選択信号として供給し、且つ、第３の
mod4回路１３ではそれぞれ０−ｊ，１−ｊ，２
−ｊ，３−ｊをｎで除算した剰余を選択信号とし
て供給しているのに対し、第４のmrd4回路１５
ではそれぞれ０＋ｉ−ｊ，１＋ｉ−ｊ，２＋ｉ−
ｊ，３＋ｉ−ｊを選択信号として供給している点
が異なる。 The configuration shown in FIG. 3 is a combination of the functions shown in FIGS. 1 and 2, and the second mod4 circuit 11
Or the third mod4 circuit 13 is the fourth mod4 circuit 1
The only difference is that it is 5. A second mod4 circuit 11 supplies the remainder obtained by dividing 0+i, 1+i, 2+i, 3+i by n as a selection signal corresponding to each write data path (0 to 3), and
In mod4 circuit 13, 0-j, 1-j, 2
-j, 3-j divided by n is supplied as a selection signal, whereas the fourth mrd4 circuit 15
Then 0+i-j, 1+i-j, 2+i-
The difference is that j, 3+i−j is supplied as a selection signal.

第３図の実施例ではｎ＝４，Ａ×ｎ＋ｉ＝17な
らびにＢ×ｎ＋ｊ＝13を与えている。従つて、l₀
＝０，l₁＝１，l₂＝２，l₃＝３が得られ、それぞ
れ第０のオペランドベクトルレジスタ１には読出
し開始アドレスとして５、第１〜第３のオペラン
ドベクトルレジスタ２〜４には読出し開始アドレ
スとして４、第０のリザルトベクトルレジスタ５
には書込み開始アドレスとして４、第１〜第３の
リザルトベクトルレジスタ６〜８には書込み開始
アドレスとして３が供給される。 In the embodiment of FIG. 3, n=4, A×n+i=17 and B×n+j=13 are given. Therefore, l ₀
= 0, l ₁ = 1, l ₂ = 2, l ₃ = 3 are obtained, and 5 is set as the read start address in the 0th operand vector register 1, and 5 is set as the read start address in the 1st to 3rd operand vector registers 2 to 4, respectively. 4 as the read start address, 0th result vector register 5
is supplied with 4 as the write start address, and 3 is supplied as the write start address with the first to third result vector registers 6-8.

斯くして、第１サイクルでは第３図の斜線で示
されるエレメントがオペランドベクトルレジスタ
１〜４からリザルトベクトルレジスタ５〜８へ移
送され、オペランドベクトルレジスタ１〜４の内
部の＃17〜＃28のエレメントがリザルトベクトル
レジスタの＃13〜＃16のエレメントとして格納さ
れる。以下に順次、オペランドベクトルレジスタ
１〜４とリザルトベクトルレジスタ５〜８とのア
ドレスを進ませることにより、第６図に示すよう
なベクトルエレメントの移送を行うことができ
る。斯かるシステムでは、第１図および第２図に
示す動作も含んだベクトルを自在に合成すること
ができる。 Thus, in the first cycle, the elements indicated by diagonal lines in FIG. The elements are stored as elements #13 to #16 of the result vector register. By sequentially advancing the addresses of operand vector registers 1-4 and result vector registers 5-8, vector elements can be transferred as shown in FIG. In such a system, vectors including the operations shown in FIGS. 1 and 2 can be freely synthesized.

次に、ｎ＝４の場合について各mod4回路およ
びデコーデ回路について説明する。 Next, each mod4 circuit and decoder circuit will be explained for the case where n=4.

第１のmod4回路１２は、読出し開始アドレス
または書込み開始アドレスとして与えられる２値
数の下位２ビツトを、直接、第２〜第４のmod4
回路１１，１３，１５およびデコーダ回路１０，
１４に供給する分配回路として与えられる。第２
のmod4回路１１は、第１のmod4回路１２から分
配されるｉの値に０〜３の定数を加算した値を与
える２ビツトの加算器、またはデコーダ本体によ
り簡単に構成することができる。 The first mod4 circuit 12 directly inputs the lower two bits of the binary number given as the read start address or write start address to the second to fourth mod4 circuits.
circuits 11, 13, 15 and decoder circuit 10,
14 as a distribution circuit. Second
The mod4 circuit 11 can be easily constructed by a 2-bit adder that provides a value obtained by adding a constant from 0 to 3 to the value of i distributed from the first mod4 circuit 12, or by a decoder itself.

第３のmod4回路１３はｊの値に対して０−ｊ，
１−ｊ，２−ｊ，３−ｊを算出する２ビツトの減
算器（ただし、負の数は補助表示）またはデコー
ダ本体で構成される。第４のmod4回路１５は同
様にｉおよびｊに関して、０＋ｉ＋ｊ，１＋ｉ−
Ｊ，２＋ｉ−ｊ，３＋ｉ−ｊを算出する加算器に
より構成することができる。第２のmod4回路１
１および第３のmod4回路１３は、ｉまたはｊの
どちらか一方が“０”で与えられる場合には第４
のmod4回路１５で構成することができる。 The third mod4 circuit 13 calculates 0−j for the value of j,
It consists of a 2-bit subtracter (however, negative numbers are auxiliary displays) for calculating 1-j, 2-j, and 3-j, or a decoder itself. Similarly, the fourth mod4 circuit 15 calculates 0+i+j, 1+i- with respect to i and j.
It can be configured by an adder that calculates J, 2+i-j, 3+i-j. 2nd mod4 circuit 1
When either i or j is given as “0”, the first and third mod4 circuits 13
mod4 circuit 15.

デコーダ回路１０，１４は読出し開始アドレス
Ａ×ｎ＋ｉまたは書込み開始アドレスＢ×ｎ＋ｊ
と、ｉまたはｊを入力し、各オペランドベクトル
レジスタ１〜４または各リザルトベクトルレジス
タ５〜８に対し、ＡおよびＢ、あるいはＡ＋１お
よびＢ＋１を供給するために第７図に示すような
構成となる。第７図において、７１は２ビツトシ
フタ、７２は＋１加算器、７３はデコーダ本体、
７４〜７７はそれぞれ第０〜第３のセレクタであ
る。２ビツトシフタ７１ではＡ×ｎ＋ｉまたはＢ
×ｎ＋ｊの下位２ビツト（ｉまたはｊの値）をシ
フトして削除し、ＡまたはＢを抜出す。＋１加算
器７２では、入力したデータに“１”だけ加算す
る。デコーダ本体７３では、ｉまたはｊを入力
し、ＡまたはＢと、Ａ＋１またはＢ＋１とのどち
らかを選択するための選択信号を発生する。実施
例ではｉまたはｊが“01”のときに選択状態とな
る。 The decoder circuits 10 and 14 use read start address A×n+i or write start address B×n+j
In order to input i or j and supply A and B or A+1 and B+1 to each operand vector register 1 to 4 or each result vector register 5 to 8, the configuration is as shown in FIG. . In FIG. 7, 71 is a 2-bit shifter, 72 is a +1 adder, 73 is a decoder main body,
74 to 77 are 0th to 3rd selectors, respectively. In the 2-bit shifter 71, A×n+i or B
The lower two bits (value of i or j) of xn+j are shifted and deleted, and A or B is extracted. The +1 adder 72 adds "1" to the input data. The decoder main body 73 receives i or j and generates a selection signal for selecting either A or B and A+1 or B+1. In the embodiment, the selected state is reached when i or j is "01".

本実施例ではｎ＝４の場合について説明した
が、本発明は一般的なｎ（正の整数）についても
構成することが可能である。特に、ｎ＝2^x（ｘは
整数）の場合には、本実施例に示したように回路
構成が最も簡単になる。 Although the present embodiment has been described for the case where n=4, the present invention can also be configured for a general n (positive integer). In particular, when n=2 ^x (x is an integer), the circuit configuration is the simplest as shown in this embodiment.

（発明の効果）以上説明したように本発明は、ベクトルレジス
タを使用したベクトル処理装置に比較的簡単な回
路を付加することにより、ベクトルエレメントの
移送および置換に関して高速に処理できると云う
効果がある。(Effects of the Invention) As explained above, the present invention has the advantage that by adding a relatively simple circuit to a vector processing device using vector registers, it is possible to perform high-speed processing of vector element transfer and replacement. .

[Brief explanation of drawings]

第１図は、本発明によるベクトル処理装置にお
いてｎ＝４であつて、Ａ×ｎ＋ｉ＝17とし、オペ
ランドベクトルレジスタの読出し開始アドレスの
みを指定可能とした実施例の構成ブロツク図であ
る。第２図は、本発明によるベクトル処理装置に
おいてｎ＝４であつてＢ×ｎ＋ｊ＝13とし、リザ
ルトベクトルレジスタの書込み開始アドレスのみ
を指定とした実施例の構成ブロツク図である。第
３図は、本発明によるベクトル処理装置において
ｎ＝４、Ａ×ｎ＋ｉ＝17，Ｂ×ｎ＋ｊ＝13とし、
オペランドベクトルレジスタとリザルトベクトル
レジスタとに対して、読出しおよび書込み開始ア
ドレスを指定可能とした実施例の構成ブロツク図
である。第４図は、第１図に示した実施例に対す
るベクトルエレメントの移送状態を示す説明図で
ある。第５図は、第２図に示した実施例に対する
ベクトルエレメントの移送状態を示す説明図であ
る。第６図は、第３図に示した実施例に対するベ
クトルエレメントの移送状態を示す説明図であ
る。第７図は、第１図〜第３図に示したデコーダ
回路の回路構成を示すブロツク図である。１〜４…オペランドベクトルレジスタ、５〜８
…リザルトベクトルレジスタ、９…アライン回
路、１０，１４…デコーダ回路、１１〜１３，１
５…mod4回路、７１…２ビツトシフタ、７２…
＋１加算器、７３…デコーダ本体、７４〜７７…
セレクタ。 FIG. 1 is a block diagram of an embodiment of a vector processing device according to the present invention in which n=4, A×n+i=17, and only the read start address of the operand vector register can be specified. FIG. 2 is a block diagram of an embodiment of the vector processing device according to the present invention in which n=4, B×n+j=13, and only the write start address of the result vector register is specified. FIG. 3 shows a vector processing device according to the present invention in which n=4, A×n+i=17, B×n+j=13,
FIG. 2 is a configuration block diagram of an embodiment in which read and write start addresses can be specified for operand vector registers and result vector registers. FIG. 4 is an explanatory diagram showing the transfer state of vector elements for the embodiment shown in FIG. FIG. 5 is an explanatory diagram showing the transfer state of vector elements for the embodiment shown in FIG. 2. FIG. 6 is an explanatory diagram showing the transfer state of vector elements for the embodiment shown in FIG. 3. FIG. 7 is a block diagram showing the circuit configuration of the decoder circuit shown in FIGS. 1-3. 1-4...Operand vector register, 5-8
...Result vector register, 9...Align circuit, 10, 14...Decoder circuit, 11 to 13, 1
5...mod4 circuit, 71...2-bit shifter, 72...
+1 adder, 73...decoder body, 74-77...
selector.

Claims

[Claims]

1. n operand vector registers for reading n (positive integer greater than 1) vector elements belonging to the same vector in one cycle, and n operand vector registers for reading n vector elements belonging to the same vector in one cycle. and n read data paths provided corresponding to the n vector elements read from the operand vector register. an alignment circuit for selectively connecting to n write data paths provided corresponding to n vector elements; and an alignment circuit for selectively connecting to n write data paths provided corresponding to n vector elements; Address data A×n+i (0A, 0i<
n) by n, and/or address data B×n+j (0B, 0j<n ) to find the remainder j divided by n
Based on the value of i detected by the mod n circuit and the first mod n circuit, a read address of A+1 is given to the 0th to i-1th operand vector registers, and a read address of A+1 is given to the i-th to n-1th operand vector registers.
A first decoder circuit for giving a read address of A to the th operand vector register, and a first decoder circuit for giving a read address of A and a value of j detected by the first mod n circuit, The write address of B+1 is given to the result vector register, and
A second decoder circuit for providing a write address of B for the −1st result vector register, and a second decoder circuit for providing a write address of B for the −1st result vector register, and k+
The remainder obtained by dividing i, k-j, and k+i-j by n is obtained as l _k (0l _k < n), and the second to fourth
mod n circuit, and configured to be able to transfer the vector elements from the n operand vector registers to the n result vector registers in parallel.