JPH0561901A

JPH0561901A - Program-controlled processor

Info

Publication number: JPH0561901A
Application number: JP3218341A
Authority: JP
Inventors: Kunitoshi Aono; 邦年青野; Maki Toyokura; 真木豊蔵; Toshiyuki Araki; 敏之荒木; Akihiko Otani; 昭彦大谷; Hisashi Kodama; 久児玉; Kiyoshi Okamoto; 潔岡本
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1990-09-03
Filing date: 1991-08-29
Publication date: 1993-03-12
Anticipated expiration: 2012-02-19
Also published as: JP2584156B2

Abstract

(57)【要約】（修正有）【目的】パイプライン演算器をプログラム制御方式の
汎用プロッセサの資源として実装制御し、高性能を得る
ことが可能なプロセッサを提供する。【構成】プログラム制御回路１は内部のプログラムカ
ウンタを停止させ、ベクトルパイプライン命令の実行サ
イクルを繰り返す。アドレス発生器２はプログラム制御
回路１からの起動信号によりあらかじめ設定されたシー
ケンスでｍ個のアドレスを連続して発生し、データメモ
リ３より連続的にｍ個のメモリ読みだしサイクルが開始
される。また制御回路１はデータ処理回路４の機能及び
パイプライン構成を制御する。アドレス発生器２はｍ個
のアドレスの発生を終了すると、プログラム制御回路１
に対して終了信号を与える。制御回路１は、アドレス発
生器２からの終了信号を受信した後さらに２サイクル遅
らせて、プログラムカウンタを再起動させる。そしてＮ
番地の特定の命令以降の命令を逐次実行させる。 (57) [Summary] (Modified) [Objective] To provide a processor capable of achieving high performance by implementing and controlling a pipeline arithmetic unit as a resource of a general-purpose processor of a program control system. Configuration: The program control circuit 1 stops an internal program counter and repeats the execution cycle of a vector pipeline instruction. The address generator 2 continuously generates m addresses in a preset sequence in response to a start signal from the program control circuit 1, and m memory read cycles are continuously started from the data memory 3. The control circuit 1 also controls the function and pipeline configuration of the data processing circuit 4. When the address generator 2 finishes generating m addresses, the program control circuit 1
Give an end signal to. After receiving the end signal from the address generator 2, the control circuit 1 delays it by two more cycles and restarts the program counter. And N
The instructions after the specific instruction of the address are executed sequentially.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明はプログラム制御方式のプ
ロセッサに関するもので、特に高速演算処理を必要とす
るデジタルシグナルプロセッサ（以下ＤＳＰと記す）の
アーキテクチュアに関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a program control type processor, and more particularly to an architecture of a digital signal processor (hereinafter referred to as DSP) which requires high speed arithmetic processing.

【０００２】[0002]

【従来の技術】従来のプログラム制御方式の汎用プロセ
ッサ例えばRISC型プロセッサのマイクロ命令セットに
は、メモリの読みだし／書き込み、レジスタのセット、
レジスタ間のデータ転送、各種算術論理演算等の命令群
が実装されており、これらの命令を用いてプログラミン
グすることにより、各種の処理を実現している。これら
の命令群はほとんど、単一動作を指令する命令であり、
単純な動作を組み合わせることで、最終的に複雑で高度
な処理が実現でき且つ汎用性を実現している。2. Description of the Related Art A conventional program control type general-purpose processor, for example, a RISC type microinstruction set, includes a memory read / write, register set,
Instruction groups such as data transfer between registers and various arithmetic and logic operations are mounted, and various processing is realized by programming using these instructions. Most of these command groups are commands that command a single operation.
By combining simple operations, finally complex and sophisticated processing can be realized and versatility is realized.

【０００３】しかしながら、上記のような単一動作命令
を一つずつ逐次的に実行しなければならないため、従来
から処理速度の点で課題となっている。However, since the above-mentioned single operation instructions must be sequentially executed one by one, there has been a problem in terms of processing speed from the prior art.

【０００４】[0004]

【発明が解決しようとする課題】以上のように、従来の
プログラム制御方式のプロセッサは、単一動作命令を組
み合わせてプログラミングされており、一命令ずつ逐次
的に実行していくため、処理速度の点で課題があった。As described above, the processor of the conventional program control system is programmed by combining the single operation instructions and sequentially executes the instructions one by one. There was a problem in terms.

【０００５】特に高速演算処理を必要とするＤＳＰにお
いては、重要な課題である。ＤＳＰにおいては汎用プロ
セッサに比べ、乗算器の内蔵、プログラムメモリの内
蔵、データメモリの分離分割、データバス／アドレスバ
スの分離分割など種々の高速化が図られている。しがし
ながら汎用プロセッサ同様、単一動作命令でプログラミ
ングされ、一命令ずつ逐次的に実行していくことに変わ
りはない。プログラム制御方式のＤＳＰで必要な処理速
度が得られない場合には、処理を特定した専用ハードウ
ェアを個々に開発する必要があった。This is an important problem especially in a DSP that requires high-speed arithmetic processing. Compared with a general-purpose processor, a DSP has various speeds up such as a built-in multiplier, a built-in program memory, a separated division of a data memory, a separated division of a data bus / address bus. However, like a general-purpose processor, it is programmed by a single operation instruction and executed one instruction at a time. When the required processing speed cannot be obtained by the program control type DSP, it is necessary to individually develop dedicated hardware that specifies processing.

【０００６】本発明は、上述の課題に鑑みてなされ、パ
イプライン演算器をプログラム制御方式の汎用プロッセ
サの資源として実装制御し、高性能を得ることが可能な
プログラム制御型プロセッサを提供することを目的とす
る。The present invention has been made in view of the above-mentioned problems, and provides a program control type processor capable of achieving high performance by mounting and controlling a pipeline arithmetic unit as a resource of a general-purpose processor of a program control system. To aim.

【０００７】[0007]

【課題を解決するための手段】本発明のプログラム制御
型プロセッサは、ベクトル・パイプライン命令を含む複
数の命令を実装するとともに、前記ベクトル・パイプラ
イン命令に基づくパイプライン演算を実行するデータ処
理回路を備えたプログラム制御型プロセッサであって、
プログラムメモリとプログラムカウンタとデコーダとを
含み、前記ベクトル・パイプライン命令が前記プログラ
ムメモリから読みだされた後前記デコーダによって解読
されたとき、前記プログラムカウンタを停止させるとと
もに起動信号を出力し、前記ベクトル・パイプライン命
令の内容に従って前記データ処理回路の動作を制御する
プログラム制御回路と、前記起動信号に基づいて、予め
設定されたシーケンスに従ってアドレスを連続的に発生
し、予め設定された数のアドレスの発生を終了したとき
終了信号を前記プログラム制御回路に出力するアドレス
発生器と、このアドレス発生器によって発生されたアド
レスに基づいて前記発生されたアドレスに予め格納され
たデータを出力するデータメモリとを備え、前記データ
処理回路は、前記データメモリから出力されたデータに
基づいて前記プログラム制御回路の制御に従って前記パ
イプライン演算を実行し、前記プログラム制御回路は、
前記終了信号を受信したときから所定サイクル後に、前
記ベクトル・パイプライン命令に基づく前記パイプライ
ン演算の終了を検出し、前記ベクトル・パイプライン命
令に続く命令を逐次実行することを特徴とする。A program control type processor according to the present invention implements a plurality of instructions including a vector pipeline instruction and executes a pipeline operation based on the vector pipeline instruction. A program control type processor having
A vector including a program memory, a program counter and a decoder, and when the vector pipeline instruction is read from the program memory and decoded by the decoder, the program counter is stopped and an activation signal is output. A program control circuit for controlling the operation of the data processing circuit according to the contents of the pipeline instruction, and an address is continuously generated according to a preset sequence on the basis of the start signal to generate a preset number of addresses. An address generator that outputs an end signal to the program control circuit when the generation is completed, and a data memory that outputs data stored in advance at the generated address based on the address generated by the address generator. And the data processing circuit comprises: Run the pipeline operation under the control of the program control circuit based on the output data from Tamemori, the program control circuit,
After a predetermined cycle from the reception of the end signal, the end of the pipeline operation based on the vector pipeline instruction is detected, and the instructions following the vector pipeline instruction are sequentially executed.

【０００８】[0008]

【作用】本発明は上記した構成により、従来の単一動作
を指示する命令セットに、さらにパイプライン処理のた
めのベクトルパイプライン命令を追加実装したプロセッ
サが実現できる。本発明のプロセッサは、ベクトルパイ
プライン命令が読みだされると、データメモリの内容を
プログラム制御回路から独立して、あらかじめ設定した
順序で逐次読みだし、この読みだしサイクルと並列に演
算サイクルを実行させるパイプライン処理を行ない、演
算器の出力を別のデータメモリに並列に逐次書き込むか
または累算器で逐次累算する。そして、設定した数のデ
ータの演算を終了すると、従来のプロセッサ同様、前記
ベクトルパイプライン命令の次のステップの命令からま
た逐次実行するものである。すなわちベクトルパイプラ
イン命令の１命令により、（数１）または（数２）で示
されるすべての演算をパイプライン並列に実行してしま
うものである。With the above-described structure, the present invention can realize a processor in which a vector pipeline instruction for pipeline processing is additionally mounted on the conventional instruction set for instructing a single operation. When the vector pipeline instruction is read, the processor of the present invention sequentially reads the contents of the data memory in a preset order independently of the program control circuit, and executes the operation cycle in parallel with this reading cycle. Then, the pipeline processing is performed, and the output of the arithmetic unit is sequentially written in parallel to another data memory or is sequentially accumulated by an accumulator. When the operation of the set number of data is completed, the instruction is executed sequentially from the instruction at the step next to the vector pipeline instruction, as in the conventional processor. That is, one vector pipeline instruction executes all the operations shown in (Equation 1) or (Equation 2) in pipeline parallel.

【０００９】[0009]

【数１】 [Equation 1]

【００１０】[0010]

【数２】 [Equation 2]

【００１１】ここでＡｉ、Ｂｉはそれぞれデータメモリ
から読みだされる演算器への入力ベクトルデータであ
り、ここでは２入力としたがその数に制限はない。また
Ｙｉは演算器の出力ベクトルデータであり別のデータメ
モリへ書き込まれる。またＸは演算器の出力データが累
算器で累算された結果のデータである。そしてまた、演
算器の演算機能は関数Ｆで示され、命令の内容により演
算器が必要な機能を発揮するよう選択または再構成され
る。Here, Ai and Bi are respectively input vector data to the arithmetic unit read from the data memory, and here, there are two inputs, but the number is not limited. Yi is output vector data of the arithmetic unit and is written in another data memory. Further, X is the result data obtained by accumulating the output data of the arithmetic unit by the accumulator. Also, the arithmetic function of the arithmetic unit is indicated by a function F, and the arithmetic unit is selected or reconfigured so as to exhibit the required function depending on the content of the instruction.

【００１２】[0012]

【実施例】図１は本発明のプロセッサの概念を示すブロ
ック図である。本発明のプロセッサは従来の命令セット
に追加してさらにベクトルパイプライン命令を実装して
いる。図１において、１はプログラムメモリ、プログラ
ムカウンタ、デコーダなどにより構成され、このベクト
ルパイプライン命令がプログラムメモリから読みだされ
ると、プログラムカウンタを停止させるとともに起動信
号を出力し、さらに前記ベクトルパイプライン命令の内
容に従ってデータ処理回路の機能およびパイプライン構
成を制御するプログラム制御回路、２は起動信号により
プログラム制御回路から独立して、あらかじめ設定され
たシーケンスでアドレスを連続して発生し、設定された
数のアドレスの発生を終了するとプログラム制御回路１
に終了信号を与えるアドレス発生器、３はこのアドレス
を入力し連続してデータを出力するデータメモリ、４は
このデータメモリ３から読み出されたデータを入力し、
パイプライン演算を行なうデータ処理回路である。実行
ユニット５はアドレス発生器２、データメモリ３、デー
タ処理装置４から構成されている。1 is a block diagram showing the concept of a processor according to the present invention. The processor of the present invention implements vector pipeline instructions in addition to the conventional instruction set. In FIG. 1, reference numeral 1 is composed of a program memory, a program counter, a decoder, and the like. When this vector pipeline instruction is read from the program memory, the program counter is stopped and an activation signal is output, and the vector pipeline is further added. The program control circuit 2 for controlling the function and pipeline configuration of the data processing circuit according to the contents of the instruction is independent of the program control circuit by the start signal and continuously generates and sets the address in a preset sequence. When generation of a number of addresses is completed, the program control circuit 1
An address generator 3 which gives an end signal to the data memory 3 inputs the address and continuously outputs data, and 4 inputs the data read from the data memory 3,
It is a data processing circuit for performing pipeline operation. The execution unit 5 is composed of an address generator 2, a data memory 3, and a data processing device 4.

【００１３】図１の動作を図２の動作タイミング図とと
もに説明する。図１のプログラム制御回路１がプログラ
ムメモリに蓄積された命令を逐次読みだして実行制御す
る様子を図２に示してあり、Ｎ番地の命令が本発明によ
るパイプライン処理を実行するベクトルパイプライン命
令である。The operation of FIG. 1 will be described with reference to the operation timing chart of FIG. FIG. 2 shows how the program control circuit 1 of FIG. 1 sequentially reads out the instructions stored in the program memory and controls the execution, and the instruction at address N executes the pipeline processing according to the present invention. Is.

【００１４】図２においては、Ｎ番地の命令以外は従来
の単一動作命令であり、かつ分岐命令でないとする。さ
らに説明を簡単にするため、命令のフェッチ、デコード
等に必要なサイクルを省略し、実行サイクルのみを示し
ている。Ｎ番地の命令以外の従来の単一動作命令の実行
は、従来のプロセッサと同様にプロセッサ内の各資源が
制御され、通常１サイクルで実行が完了する。そして逐
次命令を読みだして実行していくものであり、N-2,N-1
番地の命令実行サイクルはこの従来の単一動作命令の実
行が示されている。そして、Ｎ番地のベクトルパイプラ
イン命令が読みだされ解読されると、まず、図１のプロ
グラム制御回路１において内部のプログラムカウンタを
停止させ、ベクトルパイプライン命令の実行サイクルを
繰り返すよう制御される。また、プログラム制御回路１
からアドレス発生器２に起動信号が与えられ、起動信号
が与えられたサイクルより、アドレス発生器２はあらか
じめ設定されたシーケンスでｍ個のアドレスを連続して
発生し、データメモリ３より図２に示すように連続的に
ｍ個のメモリ読みだしサイクルが開始される。またプロ
グラム制御回路１はデータ処理回路４に対して、命令の
内容に従ってデータ処理回路４の機能及びパイプライン
構成を制御する。これは、データ処理回路４が複数の演
算器、レジスタ、メモリなどを有し、それぞれの入出力
の組み合わせをマルチプレクサなどを用いて再構成する
ことにより容易に実現できるものである。図２において
は、メモリ読みだし,処理１、処理２の３段のパイプラ
イン構成に制御された場合が示されている。In FIG. 2, it is assumed that the instructions other than the instruction at address N are conventional single operation instructions and are not branch instructions. Further, for simplification of description, cycles required for instruction fetch, decode, etc. are omitted and only execution cycles are shown. Execution of a conventional single-operation instruction other than the instruction at address N is completed in one cycle, with each resource in the processor being controlled as in the conventional processor. Then, the sequential instructions are read and executed, and N-2, N-1
The instruction execution cycle of the address indicates execution of this conventional single operation instruction. When the vector pipeline instruction at the address N is read out and decoded, first, the program control circuit 1 of FIG. 1 stops the internal program counter and controls the vector pipeline instruction execution cycle to be repeated. In addition, the program control circuit 1
From the cycle in which the activation signal is given to the address generator 2, the address generator 2 continuously generates m addresses in a preset sequence. As shown, m memory read cycles are continuously started. Further, the program control circuit 1 controls the data processing circuit 4 in accordance with the content of the instruction to control the function and pipeline configuration of the data processing circuit 4. This can be easily realized by the data processing circuit 4 having a plurality of arithmetic units, registers, memories, and the like, and reconfiguring the combination of respective inputs and outputs by using a multiplexer or the like. FIG. 2 shows a case where the control is performed by a three-stage pipeline configuration of memory reading, processing 1 and processing 2.

【００１５】以上のような制御が行なわれる事により、
図２に示すように、ｍ個のデータに対して、メモリ読み
だし,処理１,処理２の３段のパイプライン処理が連続実
行できる。そしてアドレス発生器２はｍ個のアドレスの
発生を終了すると、プログラム制御回路１に対して終了
信号を与える。プログラム制御回路１は、現在のベクト
ルパイプライン命令がメモリ読みだし、処理１、処理２
の３段のパイプライン処理であることは解読しており、
この情報を用いて、アドレス発生器２からの終了信号を
受信した後さらに２サイクル遅らせて、すなわちすべて
のパイプラインサイクルの終了を待って、前記プログラ
ムカウンタを再起動させる。そしてＮ番地のベクトルパ
イプライン命令以降の命令、すなわちN+1,N+2,N+3番地
の命令を逐次実行させる。By performing the above control,
As shown in FIG. 2, the three-stage pipeline processing of memory reading, processing 1 and processing 2 can be continuously executed for m pieces of data. When the address generator 2 finishes generating m addresses, it gives an end signal to the program control circuit 1. The program control circuit 1 reads out the current vector pipeline instruction from the memory, and processes 1 and 2
I understand that it is a three-stage pipeline process of
Using this information, the program counter is restarted with a further delay of two cycles after receiving the end signal from the address generator 2, ie waiting for the end of all pipeline cycles. Then, the instructions after the vector pipeline instruction at the address N, that is, the instructions at the addresses N + 1, N + 2, N + 3 are sequentially executed.

【００１６】以上のように、本発明によれば従来の命令
セットに追加して、パイプライン並列処理を実現する新
たなベクトルパイプライン命令を実装したプロセッサが
実現できる。このベクトルパイプライン命令により、プ
ログラム制御による汎用性を失うことなく、従来の命令
セットを用いた場合に比べパイプライン並列処理の高速
性とプログラムメモリ容量の圧縮が同時に実現される。
図１のプロセッサにおいては、Ｎ番地の命令１命令で、
ｍ個のデータに対するメモリ読みだし、処理１、処理２
の３段のパイプライン並列処理をすべて実行制御してい
る。そしてこの処理を（ｍ＋２）サイクルで実現してい
る。従来の単一動作命令を用いて制御すると、少なくと
も３ｍサイクル必要であり、約３倍の高速化が図られて
おり、パイプライン処理の段数を多くすれば、容易にさ
らなる高速化が実現できる。As described above, according to the present invention, a processor in which a new vector pipeline instruction for realizing pipeline parallel processing is added to the conventional instruction set can be realized. With this vector pipeline instruction, the high speed of pipeline parallel processing and the compression of the program memory capacity can be realized at the same time without losing the versatility of the program control, as compared with the case of using the conventional instruction set.
In the processor of FIG. 1, one instruction at address N
Memory reading for m pieces of data, processing 1, processing 2
Execution control of all three stages of pipeline parallel processing is performed. And this processing is realized in (m + 2) cycles. Controlling using a conventional single operation instruction requires at least 3 m cycles, which is about 3 times faster, and if the number of pipeline processing steps is increased, further higher speed can be easily realized.

【００１７】図３は図１のプログラム制御回路を示すブ
ロック図である。図３において、プログラム制御回路１
は、プログラムカウンタ１０と、プログラムメモリ１１
と、デコーダ１２と、ベクトル命令制御回路１３と、命
令レジスタ回路１４を主構成要素として構成されてい
る。FIG. 3 is a block diagram showing the program control circuit of FIG. In FIG. 3, the program control circuit 1
Is a program counter 10 and a program memory 11
, The decoder 12, the vector instruction control circuit 13, and the instruction register circuit 14 as main constituent elements.

【００１８】プログラムカウンタ１０でアドレス指定さ
れたプログラムメモリ１１内の命令を読みだし、第１の
パイプラインレジスタ１５を介して、デコーダ１２が前
記命令を解読し、第２のパイプラインレジスタ１６を介
して実行ユニット５に各部制御信号を送出するよう接続
され、プログラムメモリ１１の読みだしサイクル、命令
のデコードサイクル、命令の実行サイクルのパイプライ
ンを構成する。The instruction in the program memory 11 addressed by the program counter 10 is read out, the decoder 12 decodes the instruction through the first pipeline register 15, and the second pipeline register 16 through the second pipeline register 16. Are connected so as to send control signals to respective units to the execution unit 5 to form a pipeline for the program memory 11 read cycle, instruction decode cycle, and instruction execution cycle.

【００１９】プログラムカウンタ１０は、プログラムカ
ウンタレジスタ１７と第１のマルチプレクサ１８とイン
クリメンタ１９を有し、第１のマルチプレクサ１８はイ
ンクリメンタ１９、分岐アドレス制御回路２０またはプ
ログタムカウンタレジスタ１７の出力の内１つを選択し
てプログタムカウンタレジスタ１７に入力するよう接続
されている。The program counter 10 has a program counter register 17, a first multiplexer 18 and an incrementer 19. The first multiplexer 18 outputs the output of the incrementer 19, the branch address control circuit 20 or the program counter register 17. One of them is selected and connected to the program counter register 17.

【００２０】命令レジスタ回路１４は、第２のマルチプ
レクサ２１と第１のパイプラインレジスタ１５とを有
し、第１のパイプラインレジスタ１５は、第２のマルチ
プレクサ２１により前記プログラムメモリ１１または第
１のパイプラインレジスタ１４の出力のうち１つを選択
して入力するよう接続されている。The instruction register circuit 14 has a second multiplexer 21 and a first pipeline register 15, and the first pipeline register 15 is connected to the program memory 11 or the first pipeline register 15 by the second multiplexer 21. It is connected to select and input one of the outputs of the pipeline register 14.

【００２１】ベクトル命令制御回路１３は、デコーダ１
２の制御により、前記第１のマルチプレクサ１８と第２
のマルチプレクサ２１を制御し、同時に実行ユニット５
に起動信号を送出し、実行ユニット５から終了信号を受
けるよう接続されている。The vector instruction control circuit 13 includes a decoder 1
The control of the first multiplexer 18 and the second multiplexer 18
Controlling the multiplexer 21 of the execution unit 5 at the same time
Is connected to receive an end signal from the execution unit 5 by sending a start signal to the.

【００２２】このプログラム制御回路１が逐次的に前記
プログラムメモリ１１から命令を読みだし実行するとき
に、ベクトル・パイプライン命令がデコーダ１２により
解読されると、ベクトル命令制御回路１３が、前記実行
ユニット５に起動信号を与える。そして同時に、前記ベ
クトル命令制御回路１３が、前記第１のマルチプレクサ
１８がプログラムカウンタレジスタ１７の出力を選択し
プログラムカウンタレジスタ１７がデータを自己保持す
るよう制御する。さらに前記第２のマルチプレクサ２１
が第１のパイプラインレジスタ１５の出力を選択し、第
１のパイプラインレジスタ１４がデータを自己保持する
よう制御することにより、前記ベクトル・パイプライン
命令を連続して複数サイクル実行させるよう制御する。
ここで、上記プログラムカウンタ１７及びパイプライン
レジスタ１５がデータを自己保持させる手段として、直
接これらのレジスタの書き込みクロックを停止させても
良い。そして、前記ベクトル命令制御回路１３が、前記
実行ユニット５から終了信号を受け取ると、前記ベクト
ル・パイプライン命令の内容に応じて、一定サイクル遅
延させて、前記第１のマルチプレクサ１８および第２の
マルチプレクサ２１の制御を解除し、プログラムカウン
タレジスタ１７の自己保持および第１のパイプラインレ
ジスタ１５の自己保持を解除することにより、前記ベク
トル・パイプライン命令以降の命令を逐次的に実行させ
る。When the program control circuit 1 sequentially reads out the instructions from the program memory 11 and executes them, when the vector pipeline instruction is decoded by the decoder 12, the vector instruction control circuit 13 causes the execution unit to be executed. 5 is given a start signal. At the same time, the vector instruction control circuit 13 controls so that the first multiplexer 18 selects the output of the program counter register 17 and the program counter register 17 holds the data by itself. Further, the second multiplexer 21
Selects the output of the first pipeline register 15 and controls the first pipeline register 14 to hold the data by itself, thereby controlling the vector pipeline instruction to be continuously executed for a plurality of cycles. ..
Here, the program counter 17 and the pipeline register 15 may directly stop the write clocks of these registers as a means for holding the data themselves. Then, when the vector instruction control circuit 13 receives the end signal from the execution unit 5, it delays for a certain cycle according to the content of the vector pipeline instruction, and the first multiplexer 18 and the second multiplexer 18 are delayed. By releasing the control of 21, the self-holding of the program counter register 17 and the self-holding of the first pipeline register 15, the instructions after the vector pipeline instruction are sequentially executed.

【００２３】図４は図１の実行ユニットを示すブロック
図である。図４において、実行ユニット５は、第１,第
２,第３のアドレス発生器３０,３１,３２からなるアド
レス発生器２と、第１,第２,第３のデータメモリ３３,
３４,３５からなるデータメモリ３と、データ処理回路
４を有する。第１,第２のアドレス発生器３０,３１がそ
れぞれ第１,第２のデータメモリ３３,３４のアドレスを
発生し、第１,第２のデータメモリ３３,３４から読みだ
されたデータがデータ処理回路４で演算処理され、第３
のアドレス発生器３２が第３のデータメモリ３５のアド
レスを発生し、データ処理回路４で演算処理されたデー
タを第３のデータメモリ３５に書き込むよう接続されて
いる。そして、データ処理回路４はALU３６、乗算器３
７、第１,第２のパイプライン演算器３８,３９を有する
演算器ブロック４０と、レジスタ４１、累算器４２、お
よび演算器ブロック４０・レジスタ４１・累算器４２間
のデータの流れを切り替えるデータパス選択回路４３に
より構成されている。FIG. 4 is a block diagram showing the execution unit of FIG. In FIG. 4, the execution unit 5 includes an address generator 2 including first, second, and third address generators 30, 31, and 32, a first, second, and third data memory 33,
It has a data memory 3 composed of 34 and 35, and a data processing circuit 4. The first and second address generators 30 and 31 generate addresses of the first and second data memories 33 and 34, respectively, and the data read from the first and second data memories 33 and 34 are data. The arithmetic processing is performed by the processing circuit 4, and the third
2 is connected to generate an address of the third data memory 35 and write the data processed by the data processing circuit 4 into the third data memory 35. The data processing circuit 4 includes the ALU 36 and the multiplier 3
7, an arithmetic unit block 40 having first and second pipeline arithmetic units 38 and 39, a register 41, an accumulator 42, and a data flow between the arithmetic unit block 40, the register 41, and the accumulator 42. It is composed of a data path selection circuit 43 for switching.

【００２４】図５は本発明のプロセッサにおけるベクト
ル命令制御回路を示すブロック図である。図５におい
て、ベクトル命令制御回路１３は、ソースメモリ・アド
レス発生器の起動信号発生部５０と、ディスティネーシ
ョンメモリ・アドレス発生器の起動信号発生部５１と、
第１のマルチプレクサ制御信号発生部５２と、第２のマ
ルチプレクサ制御信号発生部５３と、ベクトル命令遅延
解析器５４により構成される。FIG. 5 is a block diagram showing a vector instruction control circuit in the processor of the present invention. In FIG. 5, the vector instruction control circuit 13 includes a start signal generator 50 of the source memory / address generator, a start signal generator 51 of the destination memory / address generator,
It is composed of a first multiplexer control signal generator 52, a second multiplexer control signal generator 53, and a vector instruction delay analyzer 54.

【００２５】そして、ソースメモリ・アドレス発生器の
起動信号発生部５０は、デコーダ１２がベクトルパイプ
ライン命令を解読してベクトル命令制御回路１３に出力
するベクトル命令信号により、ソースメモリ・アドレス
発生器２の起動信号をアサートし、アドレス発生器２が
出力する終了信号によりネゲートするよう制御される。
図５では、ＲＳフリップフロップ５５を用いて、ベクト
ル命令信号により起動信号がセットされ、終了信号によ
りリセットされるまでアサートされる。Then, the activation signal generator 50 of the source memory address generator 2 receives the vector instruction signal which the decoder 12 decodes the vector pipeline instruction and outputs to the vector instruction control circuit 13 according to the source memory address generator 2. Is asserted, and the end signal output from the address generator 2 is negated.
In FIG. 5, the RS flip-flop 55 is used to set the start signal by the vector command signal and assert it until reset by the end signal.

【００２６】次に、ディスティネーションメモリ・アド
レス発生器の起動信号発生部５１は、前記ベクトル命令
信号が入力される第１のシフトレジスタ５６と、第１の
シフトレジスタ５６の各遅延出力の１つを選択して出力
する第３のマルチプレクサ５７により構成される。デコ
ーダ１２は、前記ベクトル命令信号を出力するときに同
時に、ベクトルパイプライン命令の種類を解読してベク
トル命令種別信号を出力する。このベクトル命令種別信
号により前記ベクトル命令遅延解析器５４が必要な遅延
を判断して前記第３のマルチプレクサ５７が制御され
る。そしてこの第３のマルチプレクサ５７の出力がディ
スティネーションメモリ・アドレス発生器及び累積器の
起動信号として供給される。Next, the start signal generator 51 of the destination memory address generator is one of the first shift register 56 to which the vector command signal is input and one of the delay outputs of the first shift register 56. Is selected and output. At the same time that the decoder 12 outputs the vector instruction signal, it simultaneously decodes the type of the vector pipeline instruction and outputs the vector instruction type signal. Based on this vector instruction type signal, the vector instruction delay analyzer 54 determines the required delay and controls the third multiplexer 57. The output of the third multiplexer 57 is supplied as a start signal for the destination memory address generator and accumulator.

【００２７】次に、第２のマルチプレクサ制御信号発生
部５３は、前記終了信号が入力される第２のシフトレジ
スタ５８と、第２のシフトレジスタ５８の各遅延出力の
１つを選択して出力する第４のマルチプレクサ５９とＡ
ＮＤゲート６０で構成される。前記第４のマルチプレク
サ５９は前記ベクトル命令遅延解析器５４がベクトル命
令種別信号により必要な遅延を判断して制御される。そ
して前記ベクトル命令信号により第２のマルチプレクサ
制御信号をアサートし、第４のマルチプレクサの出力に
よりネゲートするよう制御される。図５では、ＡＮＤゲ
ート６０を用いて、ベクトル命令信号により第２のマル
チプレクサ制御信号をアサートし、第４のマルチプレク
サの出力によりネゲートされるまで状態を保持する。以
上により、ベクトルパイプライン命令を実行するときに
は、図３のマルチプレクサ２１は、第１のパイプライン
レジスタ１５がデータを自己保持するよう制御され、他
の命令を実行するときには、プログラムメモリ１１を選
択するよう制御される。Next, the second multiplexer control signal generator 53 selects and outputs one of the second shift register 58 to which the end signal is input and each delay output of the second shift register 58. Fourth multiplexer 59 and A
It is composed of the ND gate 60. The fourth multiplexer 59 is controlled by the vector instruction delay analyzer 54 by determining the required delay according to the vector instruction type signal. Then, the vector command signal asserts the second multiplexer control signal, and the output of the fourth multiplexer controls the negation. In FIG. 5, AND gate 60 is used to assert the second multiplexer control signal with the vector instruction signal and hold the state until negated by the output of the fourth multiplexer. As described above, when executing the vector pipeline instruction, the multiplexer 21 of FIG. 3 is controlled so that the first pipeline register 15 holds the data by itself, and when executing another instruction, selects the program memory 11. Controlled.

【００２８】最後に、第１のマルチプレクサ制御信号発
生部５２は、第２のマルチプレクサ制御信号発生部５３
と、第１のマルチプレクサ制御回路６１により構成され
ている。前記第２のマルチプレクサ制御信号と、前記デ
コーダ１２から出力されるアドレス分岐制御信号の制御
より、第１のマルチプレクサ制御回路６１が第１のマル
チプレクサ制御信号を出力する。以上により、ベクトル
パイプライン命令を実行するときには、図３のマルチプ
レクサ１８は、プログラムカウンタレジスタ１７がデー
タを自己保持するよう制御され、他の命令を実行すると
きには、インクリメンタ１９または分岐アドレス制御回
路２０を選択するよう制御される。Finally, the first multiplexer control signal generator 52 has the second multiplexer control signal generator 53.
And a first multiplexer control circuit 61. The first multiplexer control circuit 61 outputs the first multiplexer control signal under the control of the second multiplexer control signal and the address branch control signal output from the decoder 12. As described above, when executing the vector pipeline instruction, the multiplexer 18 of FIG. 3 is controlled so that the program counter register 17 holds the data by itself, and when executing another instruction, the incrementer 19 or the branch address control circuit 20. Is controlled to select.

【００２９】図６は本発明のプロセッサにおけるアドレ
ス発生器を示すブロック図である。図６において、アド
レス発生器２は、アドレス演算器６７とサイクルカウン
タ６８と終了判定回路６９で構成される。そして、ベク
トルパイプライン命令を実行するときにプログラム制御
回路１のベクトル命令制御回路１３から与えられる起動
信号がアサートされている期間、起動信号の制御により
アドレス演算器６７がデータメモリ３のアドレスを順次
発生する。また同時にサイクルカウンタ６８が、起動信
号によりアドレス演算器６７が発生したアドレスの数を
計数する。そしてサイクルカウンタ６８の値が、一定値
になると終了判定回路６９が終了信号を出力するよう制
御される。ここでアドレス演算器６７は、従来のポイン
タや２次元アドレス演算器等で構成することが出来る。
またベクトルパイプライン命令以外の命令を実行すると
きには、アドレス演算器６７は、プログラム制御回路１
のデコーダ１２の各部制御信号により制御され、１つず
つアドレスを発生する。FIG. 6 is a block diagram showing an address generator in the processor of the present invention. In FIG. 6, the address generator 2 includes an address calculator 67, a cycle counter 68, and an end determination circuit 69. Then, during execution of the vector pipeline instruction, during the period when the activation signal given from the vector instruction control circuit 13 of the program control circuit 1 is asserted, the address arithmetic unit 67 sequentially controls the addresses of the data memory 3 under the control of the activation signal. Occur. At the same time, the cycle counter 68 counts the number of addresses generated by the address calculator 67 by the activation signal. Then, when the value of the cycle counter 68 reaches a constant value, the end determination circuit 69 is controlled to output an end signal. Here, the address calculator 67 can be configured by a conventional pointer, a two-dimensional address calculator, or the like.
Further, when executing an instruction other than the vector pipeline instruction, the address calculator 67 uses the program control circuit 1
The address is generated one by one under the control of the control signals of the respective units of the decoder 12.

【００３０】以上の構成により、前記従来の命令を実行
するときは、実行ユニット５は１サイクルで実行が終了
するように単一動作の機能およびデータパスが選択され
る。With the above configuration, when executing the conventional instruction, the execution unit 5 selects the function and data path of a single operation so that the execution is completed in one cycle.

【００３１】また、前記ベクトルパイプライン命令を実
行するときは、第１,第２のソースデータメモリ３３,３
４の出力を、前記演算器ブロック４０に入力し、演算器
ブロック４０の出力を第３のディスティネーションデー
タメモリ３５又は累算器４２に入力するよう、実行ユニ
ット５の機能およびデータパスが選択される。When executing the vector pipeline instruction, the first and second source data memories 33 and 3 are used.
4 is input to the arithmetic unit block 40 and the output of the arithmetic unit block 40 is input to the third destination data memory 35 or the accumulator 42. It

【００３２】そして、プログラム制御回路１から前記第
１,第２のソースデータメモリ３３,３４のアドレスを発
生する第１第２のアドレス発生器３０、３１に第１第２
の起動信号が与えられることにより、第１のアドレス発
生器３０、３１が、前記プログラム制御回路１から独立
して、予め設定されたシーケンスで一連のアドレスの発
生を開始する。そして発生したアドレスに応じて、前記
ソースデータメモリ３３、３４から一連のベクトルデー
タを読みだし、前記演算器ブロック４０が連続して演算
処理する。Then, the first and second address generators 30 and 31 for generating the addresses of the first and second source data memories 33 and 34 from the program control circuit 1 are connected to the first and second address generators 30 and 31, respectively.
The first address generators 30 and 31 start generating a series of addresses in a preset sequence independently of the program control circuit 1 by the activation signal of the above. Then, according to the generated address, a series of vector data is read from the source data memories 33 and 34, and the arithmetic unit block 40 continuously performs arithmetic processing.

【００３３】そして演算器ブロック４０の出力のベクト
ルデータは、プログラム制御回路１から前記ディスティ
ネーションデータメモリ３５のアドレスを発生する第３
のアドレス発生器３２に実行ユニット５のパイプライン
遅延段数分、即ち実行ユニット５のパイプラインの段数
から１を引いたサイクル数分だけ遅れて第３の起動信号
が与えられることにより、第３のアドレス発生器３２
が、一連のアドレスの発生を開始し、連続して前記ディ
スティネーションデータメモリ３５に書き込むか、又は
プログラム制御回路１から累算器４２に実行ユニット５
のパイプライン遅延段数分遅れて第４の起動信号が与え
られることにより、前記累算器４２が累算を開始する。The vector data output from the arithmetic unit block 40 is used by the program control circuit 1 to generate the address of the destination data memory 35.
The address generator 32 is supplied with the third activation signal with a delay of the number of pipeline delay stages of the execution unit 5, that is, the number of cycles obtained by subtracting 1 from the number of pipeline stages of the execution unit 5, thereby providing the third activation signal. Address generator 32
Starts to generate a series of addresses and continuously writes to the destination data memory 35, or the program control circuit 1 causes the accumulator 42 to execute unit 5.
The accumulator 42 starts accumulating when the fourth activation signal is given after a delay of the number of pipeline delay stages.

【００３４】以上のように、複数サイクルでベクトルパ
イプライン演算を実行し、前記第１のアドレス発生器３
０が予め設定された数のアドレスの発生を終了すると、
前記プログラム制御回路１に終了信号を返すよう制御さ
れるものである。As described above, the vector pipeline operation is executed in a plurality of cycles, and the first address generator 3
When 0 finishes generating the preset number of addresses,
The program control circuit 1 is controlled so as to return an end signal.

【００３５】ここでソースデータメモリを３３,３４、
ディスティネーションデータメモリを３５とした実施例
を説明したが、これらの組み合わせは任意であることは
言うまでもない。また終了信号を第１のアドレス発生器
３０が出力するとしたが、他のアドレス発生器が終了信
号を発生しても問題はない。Here, the source data memory 33, 34,
Although the embodiment has been described in which the destination data memory is 35, it goes without saying that these combinations are arbitrary. Although the end signal is output from the first address generator 30, there is no problem even if another address generator generates the end signal.

【００３６】図７、図８は本発明のプロセッサの動作を
説明するタイミング図である。本発明のプロセッサは、
従来の命令セットに追加してベクトル・パイプライン命
令を実装したプロセッサであり、図３のプログラム制御
回路、および図４の実行ユニットで構成されるものであ
る。そして、ベクトルパイプライン命令は、ソースデー
タメモリに蓄えられているベクトルデータを読みだし
て、演算器で演算し、出力ベクトルデータをディスティ
ネーションデータメモリに書き込む第１の種類のベクト
ルパイプライン命令と、前記演算器の出力ベクトルデー
タを累算器１０で累算する第２の種類のベクトルパイプ
ライン命令をもっている。7 and 8 are timing charts for explaining the operation of the processor of the present invention. The processor of the present invention is
This is a processor that implements a vector pipeline instruction in addition to the conventional instruction set, and is composed of the program control circuit of FIG. 3 and the execution unit of FIG. The vector pipeline instruction reads the vector data stored in the source data memory, calculates the vector data with the arithmetic unit, and writes the output vector data into the destination data memory. It has a second type of vector pipeline instruction for accumulating the output vector data of the arithmetic unit in the accumulator 10.

【００３７】第１の種類のベクトルパイプライン命令
は、（数１）で示したパイプライン並列演算を実行し、
第２の種類のベクトルパイプライン命令は、（数２）で
示したパイプライン並列演算を実行するものである。そ
して、図７は第１の種類のベクトルパイプライン命令の
動作を説明する動作タイミング図であり、図８は第１の
種類のベクトルパイプライン命令の動作を説明する動作
タイミング図である。The first type of vector pipeline instruction executes the pipeline parallel operation shown in (Equation 1),
The second type of vector pipeline instruction executes the pipeline parallel operation shown in (Equation 2). FIG. 7 is an operation timing chart for explaining the operation of the first type vector pipeline instruction, and FIG. 8 is an operation timing chart for explaining the operation of the first type vector pipeline instruction.

【００３８】図７に、図３のプログラム制御回路１がプ
ログラムメモリ１１に蓄積された命令を逐次読みだし
て、図４の実行ユニット５が実行制御される様子を示
す。図７においてＮ番地の命令が本発明によるパイプラ
イン処理を実行するベクトルパイプライン命令である。
図７においても、Ｎ番地の命令以外は従来の単一動作命
令であり、かつ分岐命令でないとする。FIG. 7 shows a state in which the program control circuit 1 of FIG. 3 sequentially reads the instructions stored in the program memory 11 and the execution unit 5 of FIG. 4 is execution-controlled. In FIG. 7, the instruction at address N is a vector pipeline instruction for executing the pipeline processing according to the present invention.
Also in FIG. 7, it is assumed that the instructions other than the instruction at address N are conventional single operation instructions and are not branch instructions.

【００３９】ここでは、Ｎ番地のベクトルパイプライン
命令は、第１のデータメモリ３３と第２のデータメモリ
３４に蓄えられているベクトルデータを読みだして、第
１のパイプライン演算器３８で演算し、出力ベクトルデ
ータを第３のデータメモリ３５に書き込むことを指示し
ている第１の種類のベクトルパイプライン命令であると
する。また、前記ベクトルデータはｍ個の要素のベクト
ルとしている。Here, the vector pipeline instruction at the address N reads the vector data stored in the first data memory 33 and the second data memory 34, and the first pipeline arithmetic unit 38 calculates the vector data. However, it is assumed that the first type of vector pipeline instruction instructs to write the output vector data to the third data memory 35. The vector data is a vector of m elements.

【００４０】図７において、命令読みだしサイクル、デ
コードサイクル、命令実行サイクルにおいて各サイクル
で何番地の命令が処理されているかが示されている。ま
たベクトルパイプライン命令が実行されるときの各部の
動作タイミングが示されている。図３のプログラム制御
回路１が逐次的にプログラムを実行制御するときに、従
来の単一動作命令すなわちALU演算、乗算、１データの
ロード、ストアなどを実行するときの動作は、従来のプ
ロセッサと同様である。すなわち、プログラムメモリ１
１からの命令読みだしサイクル、デコードサイクル、命
令実行サイクルがパイプラインで逐次される。そして図
４の実行ユニット５はデコーダ１２で解読された単一動
作命令の内容に応じて、データパス選択回路４３により
データパスが選択され、１サイクルで実行が終了する。
図７に示したN-2,N-1番地の命令実行サイクルはこの従
来の単一動作命令の実行の様子が示されている。FIG. 7 shows at which address the instruction is processed in each cycle in the instruction read cycle, the decode cycle and the instruction execution cycle. Also, the operation timing of each unit when the vector pipeline instruction is executed is shown. When the program control circuit 1 of FIG. 3 sequentially controls the execution of a program, the operation when executing a conventional single operation instruction, that is, ALU operation, multiplication, 1 data load, store, etc. It is the same. That is, the program memory 1
The instruction read cycle from 1, the decode cycle, and the instruction execution cycle are sequentially performed in the pipeline. Then, the execution unit 5 of FIG. 4 selects the data path by the data path selection circuit 43 according to the content of the single operation instruction decoded by the decoder 12, and the execution is completed in one cycle.
The instruction execution cycles at addresses N-2 and N-1 shown in FIG. 7 show the execution of the conventional single operation instruction.

【００４１】次にベクトルパイプライン命令が読みださ
れ実行するときの動作を説明する。まず、実行ユニット
５はデコーダ１２で解読されたベクトルパイプライン命
令の内容に応じて、データパス選択回路４３により、ソ
ースデータメモリ３３、３４の出力を、第１のパイプラ
イン演算器３８に入力し、演算器３８の出力をディステ
ィネーションデータメモリ３５に入力するようデータパ
スが選択される。さらに、ここでは第１のパイプライン
演算器３８は、２段のパイプラインで実現されているも
のとするが、特に制限はない。画像処理のためには、フ
ィルタ演算器や、コサイン変換器（ＤＣＴ）などの、専
用パイプライン演算器などを集積すると、パイプライン
並列度に応じて、性能がたとえば１０〜１００倍と飛躍
的に向上する。Next, the operation when the vector pipeline instruction is read and executed will be described. First, the execution unit 5 inputs the outputs of the source data memories 33 and 34 to the first pipeline arithmetic unit 38 by the data path selection circuit 43 according to the contents of the vector pipeline instruction decoded by the decoder 12. , The data path is selected so that the output of the arithmetic unit 38 is input to the destination data memory 35. Further, here, the first pipeline arithmetic unit 38 is realized by a two-stage pipeline, but there is no particular limitation. For image processing, by integrating a filter arithmetic unit, a dedicated pipeline arithmetic unit such as a cosine transformer (DCT), etc., the performance is dramatically increased, for example, 10 to 100 times, depending on the degree of pipeline parallelism. improves.

【００４２】前記Ｎ番地のベクトル・パイプライン命令
がデコーダ１２により解読されると、前記ベクトル命令
制御回路１３が、図７に示したようにマルチプレクサ制
御信号を出力する。これにより、前記第１のマルチプレ
クサ１８がプログラムカウンタレジスタ１７の出力を選
択しプログラムカウンタレジスタ１７がデータを自己保
持するよう制御する。さらに前記第２のマルチプレクサ
２１が第１のパイプラインレジスタ１５の出力を選択し
パイプラインレジスタ１５がデータを自己保持するよう
制御する。よって図７に示すように、Ｎ番地のベクトル
・パイプライン命令を連続して複数サイクルで実行でき
るよう制御する。When the vector pipeline instruction at the address N is decoded by the decoder 12, the vector instruction control circuit 13 outputs a multiplexer control signal as shown in FIG. As a result, the first multiplexer 18 selects the output of the program counter register 17 and controls the program counter register 17 to hold the data by itself. Further, the second multiplexer 21 selects the output of the first pipeline register 15 and controls the pipeline register 15 to hold the data by itself. Therefore, as shown in FIG. 7, control is performed so that the vector pipeline instruction at address N can be continuously executed in a plurality of cycles.

【００４３】そしてまた、前記Ｎ番地のベクトル・パイ
プライン命令がデコーダ１２により解読されると、プロ
グラム制御回路１のベクトル命令制御回路１３から第
１、第２のアドレス発生器３０、３１に第１、第２の起
動信号がそれぞれ与えられ、アドレス発生器３０、３１
はプログラム制御回路１から独立して、あらかじめ設定
されたシーケンスでそれぞれｍ個のアドレスを連続サイ
クルで発生し、第１、第２のデータメモリ３３、３４よ
り図７に示すようにそれぞれ連続的にｍ個のメモリ読み
だしサイクルが開始される。ここでは、前記２段のパイ
プライン演算器３８が連続して、図７の演算１、演算２
に示すようにパイプライン演算処理し、パイプライン演
算器３８の出力のベクトルデータは、第３のデータメモ
リ３５に入力される。When the vector pipeline instruction at the address N is decoded by the decoder 12, the vector instruction control circuit 13 of the program control circuit 1 sends the first and second address generators 30 and 31 a first instruction. , A second activation signal is applied to each of the address generators 30 and 31.
Is independent of the program control circuit 1 and generates m addresses in a continuous cycle in a preset sequence. The first and second data memories 33 and 34 continuously generate the addresses as shown in FIG. The m memory read cycle is started. Here, the two-stage pipeline arithmetic unit 38 is continuously connected to the arithmetic operation 1 and the arithmetic operation 2 in FIG.
Pipeline arithmetic processing is performed as shown in (3), and the vector data output from the pipeline arithmetic unit 38 is input to the third data memory 35.

【００４４】ここで、プログラム制御回路１から前記デ
ィスティネーションデータメモリ３５のアドレスを発生
する第３のアドレス発生器３２に実行ユニットのパイプ
ライン遅延段数分遅れて、即ち演算１、演算２、書き込
みの３サイクル分遅れて、第３の起動信号が与えられる
ことにより、第３のアドレス発生器３２が、一連のアド
レスの発生を開始し、連続して前記ディスティネーショ
ンデータメモリ３５に書き込まれる。プログラム制御回
路１は、現在のＮ番地のベクトルパイプライン命令がメ
モリ読みだし、演算１、演算２、メモリ書き込みの４段
のパイプライン処理であることは解読しており、この情
報を用いて、アドレス発生器３２への第３の起動信号を
第１第２のアドレス発生器３０、３１への第１第２の起
動信号より３サイクル遅らせて与えることができる。Here, the program control circuit 1 delays the third address generator 32 for generating the address of the destination data memory 35 by the pipeline delay stage number of the execution unit, that is, the operation 1, the operation 2, and the write operation. When the third activation signal is applied with a delay of three cycles, the third address generator 32 starts to generate a series of addresses, and the addresses are continuously written in the destination data memory 35. The program control circuit 1 has deciphered that the current vector pipeline instruction at address N is a memory read, and is a four-stage pipeline process of operation 1, operation 2, and memory write, and using this information, The third activation signal to the address generator 32 can be given three cycles later than the first and second activation signals to the first and second address generators 30 and 31.

【００４５】以上のような制御が行なわれる事により、
図７に示すように、それぞれｍ個のベクトルデータに対
して、メモリ読みだし、演算１、演算２、メモリ書き込
みの４段のパイプライン処理が連続実行できる。By performing the above control,
As shown in FIG. 7, a four-stage pipeline process of memory read, operation 1, operation 2, and memory write can be continuously executed for each of m pieces of vector data.

【００４６】そして、第１のアドレス発生器３０はｍ個
のアドレスの発生を終了すると、プログラム制御回路１
に対して第１の終了信号を与える。前記ベクトル命令制
御回路１３が、第１の終了信号を受け取ると、前記Ｎ番
地のベクトル・パイプライン命令の処理内容に応じて、
図７に示すように、一定サイクル遅延させて、前記マル
チプレクサ１８およびマルチプレクサ２１の制御信号を
解除し、プログラムカウンタレジスタ１７の自己保持お
よびパイプラインレジスタ１５の自己保持を解除し、前
記プログラムカウンタ１０およびパイプラインレジスタ
１５を再起動させる。When the first address generator 30 finishes generating m addresses, the program control circuit 1
A first end signal is given to. When the vector instruction control circuit 13 receives the first end signal, according to the processing contents of the vector pipeline instruction at the address N,
As shown in FIG. 7, the control signals of the multiplexer 18 and the multiplexer 21 are released with a certain cycle delay, the self-holding of the program counter register 17 and the self-holding of the pipeline register 15 are released, and the program counter 10 and The pipeline register 15 is restarted.

【００４７】ここで一定サイクルとは、Ｎ番地のベクト
ル・パイプライン命令では２サイクルである。前述のよ
うに、プログラム制御回路１は、現在のＮ番地のベクト
ルパイプライン命令がメモリ読みだし、演算１、演算
２、メモリ書き込みの４段のパイプライン処理であるこ
とは解読しており、この情報を用いて、前記第１の終了
信号を受け取ってから一定サイクルすなわち２サイクル
遅らせて制御信号を解除することができる。３サイクル
でなく２サイクルとするのは、マルチプレクサ制御信号
を解除しても、プログラム制御回路のパイプライン構成
により、さらに１サイクルＮ番地の命令が実行されるた
め、１サイクル分早く制御信号を解除するものである。Here, the constant cycle is two cycles in the vector pipeline instruction at the address N. As described above, the program control circuit 1 deciphers that the current vector pipeline instruction at address N is a memory read, and is a four-stage pipeline process of operation 1, operation 2, and memory write. The information can be used to release the control signal with a fixed or two cycle delay after receiving the first end signal. The reason why the number of cycles is set to 2 instead of 3 is that even if the multiplexer control signal is released, the instruction at address N is executed one cycle later due to the pipeline configuration of the program control circuit, so the control signal is released one cycle earlier To do.

【００４８】前記プログラムカウンタ１０およびパイプ
ラインレジスタ１５が再起動した後、Ｎ番地のベクトル
・パイプライン命令にかかわるすべてのパイプラインサ
イクルが終了し、Ｎ番地のベクトルパイプライン命令以
降の命令、すなわちN+1,N+2,N+3番地の命令が従来のプ
ロセッサ同様逐次実行される。After the program counter 10 and the pipeline register 15 are restarted, all the pipeline cycles relating to the vector pipeline instruction at the address N are completed, and the instruction after the vector pipeline instruction at the address N, that is, N Instructions at addresses +1, N + 2, N + 3 are sequentially executed as in the conventional processor.

【００４９】以上、本発明の実施例によれば、ベクトル
パイプライン命令１命令により、（数１）で示したパイ
プライン並列演算が実現でき、パイプライン並列度に応
じて、性能が１０〜１００倍と飛躍的に向上する。また
プログラムメモリのステップ数も１ステップに圧縮でき
る。As described above, according to the embodiment of the present invention, the pipeline parallel operation shown in (Equation 1) can be realized by one vector pipeline instruction, and the performance is 10 to 100 depending on the pipeline parallel degree. Doubled and dramatically improved. Also, the number of steps in the program memory can be compressed to one step.

【００５０】図８は本発明のプロセッサの他の動作を説
明するタイミング図である。以下、本発明のプロセッサ
の他の動作を図８と用いて説明する。図８において、図
７と異なるのは、Ｎ番地のベクトルパイプライン命令
が、第２のデータメモリ３４と第３のデータメモリ３５
に蓄えられているベクトルデータを読みだして、ALU３6
で演算し、出力ベクトルデータを累算器４２で累算する
ことを指示している第２の種類のベクトルパイプライン
命令であることである。図８においても、命令読みだし
サイクル、デコードサイクル、命令実行サイクルにおい
て各サイクルで何番地の命令が処理されているかが示さ
れている。またベクトルパイプライン命令が実行される
ときの各部の動作タイミングが示されている。FIG. 8 is a timing diagram illustrating another operation of the processor of the present invention. Hereinafter, another operation of the processor of the present invention will be described with reference to FIG. 8 is different from FIG. 7 in that the vector pipeline instruction at the address N is the second data memory 34 and the third data memory 35.
The vector data stored in the
Is a vector pipeline instruction of the second type instructing that the output vector data is accumulated by the accumulator 42. FIG. 8 also shows the address of the instruction processed in each cycle in the instruction read cycle, the decode cycle, and the instruction execution cycle. Also, the operation timing of each unit when the vector pipeline instruction is executed is shown.

【００５１】図３のプログラム制御回路１が逐次的にプ
ログラムを実行制御するときに、従来の単一動作命令す
なわちALU演算、乗算、１データのロード、ストアなど
を実行するときの動作は、従来のプロセッサと同様であ
る。図８に示したN-2,N-1番地の命令実行サイクルはこ
の従来の単一動作命令の実行の様子が示されている。When the program control circuit 1 of FIG. 3 sequentially controls the execution of a program, the conventional single operation instruction, that is, the ALU operation, the multiplication, the load of 1 data, the store, etc., are executed conventionally. Similar to the processor of. The instruction execution cycle at addresses N-2 and N-1 shown in FIG. 8 shows the execution of the conventional single operation instruction.

【００５２】次にベクトルパイプライン命令が読みださ
れ実行するときの動作を説明する。まず、実行ユニット
５はデコーダ１２で解読されたベクトルパイプライン命
令の内容に応じて、データパス選択回路４３により、ソ
ースデータメモリ３４、３５の出力を、ALU３６に入力
し、ALU３６の出力を累算器４２に入力するようデータ
パスが選択される。ここでは演算器としてALU３６を選
択しているが特に制限はない。Next, the operation when the vector pipeline instruction is read and executed will be described. First, the execution unit 5 inputs the outputs of the source data memories 34 and 35 to the ALU 36 and accumulates the outputs of the ALU 36 by the data path selection circuit 43 according to the contents of the vector pipeline instruction decoded by the decoder 12. A data path is selected for input to the instrument 42. Here, the ALU 36 is selected as the arithmetic unit, but there is no particular limitation.

【００５３】前記Ｎ番地のベクトル・パイプライン命令
がデコーダ１２により解読されると、前記ベクトル命令
制御回路１３が、図８に示したようにマルチプレクサ制
御信号を出力する。これにより、前記第１のマルチプレ
クサ１８がプログラムカウンタレジスタ１７の出力を選
択しプログラムカウンタレジスタ１７がデータを自己保
持するよう制御する。さらに前記第２のマルチプレクサ
２１が第１のパイプラインレジスタ１５の出力を選択し
パイプラインレジスタ１５がデータを自己保持するよう
制御する。よって図８に示すように、Ｎ番地のベクトル
・パイプライン命令を連続して複数サイクルで実行でき
るよう制御する。When the vector pipeline instruction at the address N is decoded by the decoder 12, the vector instruction control circuit 13 outputs a multiplexer control signal as shown in FIG. As a result, the first multiplexer 18 selects the output of the program counter register 17 and controls the program counter register 17 to hold the data by itself. Further, the second multiplexer 21 selects the output of the first pipeline register 15 and controls the pipeline register 15 to hold the data by itself. Therefore, as shown in FIG. 8, control is performed so that the vector pipeline instruction at address N can be continuously executed in a plurality of cycles.

【００５４】そしてまた、前記Ｎ番地のベクトル・パイ
プライン命令がデコーダ１２により解読されると、プロ
グラム制御回路１のベクトル命令制御回路１３から第
２,第３のアドレス発生器３１,３２に第２,第３の起動
信号がそれぞれ与えられ、アドレス発生器３１,３２は
プログラム制御回路１から独立して、あらかじめ設定さ
れたシーケンスでそれぞれｍ個のアドレスを連続サイク
ルで発生し、第２,第３のデータメモリ３４,３５より図
８に示すようにそれぞれ連続的にｍ個のメモリ読みだし
サイクルが開始される。そしてここでは、ALU３６が演
算処理し、ALU３６の出力のベクトルデータは、累算器
４２に入力される。ここで、プログラム制御回路１から
累算器４２に実行ユニットのパイプライン遅延段数分遅
れて、即ちALU演算、累算の2サイクル分遅れて、累算器
４２に第４の起動信号が与えられることにより、累算を
開始する。When the vector pipeline instruction at the address N is decoded by the decoder 12, the vector instruction control circuit 13 of the program control circuit 1 outputs the second instruction to the second and third address generators 31 and 32. The third and third activation signals are respectively applied, and the address generators 31 and 32 independently of the program control circuit 1 generate m addresses in consecutive cycles in a preset sequence. As shown in FIG. 8, m memory reading cycles are continuously started from the data memories 34 and 35 of FIG. Then, here, the ALU 36 performs arithmetic processing, and the vector data output from the ALU 36 is input to the accumulator 42. Here, the program control circuit 1 gives a fourth activation signal to the accumulator 42 after a delay of the number of pipeline delay stages of the execution unit, that is, after a delay of two cycles of ALU operation and accumulation. By this, the accumulation is started.

【００５５】プログラム制御回路１は、現在のＮ番地の
ベクトルパイプライン命令がメモリ読みだし、ALU演
算、累算の３段のパイプライン処理であることは解読し
ており、この情報を用いて、累算器４２への第４の起動
信号を第２,第３のアドレス発生器３１,３２への第２,
第３の起動信号より２サイクル遅らせて与えることがで
きる。The program control circuit 1 has deciphered that the current vector pipeline instruction at address N is a memory read and is a three-stage pipeline process of ALU operation and accumulation, and using this information, The fourth activation signal to the accumulator 42 is sent to the second and third address generators 31 and 32,
It can be given two cycles later than the third activation signal.

【００５６】以上のような制御が行なわれる事により、
図８に示すように、それぞれｍ個のベクトルデータに対
して、メモリ読みだし、ALU演算、累算の３段のパイプ
ライン処理が連続実行できる。そして、第２のアドレス
発生器３１はｍ個のアドレスの発生を終了すると、プロ
グラム制御回路１に対して第２の終了信号を与える。By performing the above control,
As shown in FIG. 8, three-stage pipeline processing of memory reading, ALU calculation, and accumulation can be continuously executed for each of m pieces of vector data. When the second address generator 31 finishes generating m addresses, it gives a second end signal to the program control circuit 1.

【００５７】前記ベクトル命令制御回路１３が、第２の
終了信号を受け取ると、前記Ｎ番地のベクトル・パイプ
ライン命令の処理内容に応じて、図８に示すように、一
定サイクル遅延させて、前記マルチプレクサ１８および
マルチプレクサ２１の制御信号を解除し、プログラムカ
ウンタレジスタ１７の自己保持およびパイプラインレジ
スタ１５の自己保持を解除し、前記プログラムカウンタ
１０およびパイプラインレジスタ１５を再起動させる。
ここで一定サイクルとは、Ｎ番地のベクトル・パイプラ
イン命令では１サイクルである。前述のように、プログ
ラム制御回路１は、現在のＮ番地のベクトルパイプライ
ン命令がメモリ読みだし、ALU演算、累算の３段のパイ
プライン処理であることは解読しており、この情報を用
いて、前記第１の終了信号を受け取ってから一定サイク
ルすなわち１サイクル遅らせて制御信号を解除すること
ができる。２サイクルでなく１サイクルとするのは、制
御信号を解除しても、プログラム制御回路のパイプライ
ン構成により、さらに１サイクルＮ番地の命令が実行さ
れるため、１サイクル分早く制御信号を解除するもので
ある。When the vector instruction control circuit 13 receives the second end signal, it delays by a certain cycle as shown in FIG. 8 according to the processing contents of the vector pipeline instruction at the address N, and The control signals of the multiplexer 18 and the multiplexer 21 are released, the self-holding of the program counter register 17 and the self-holding of the pipeline register 15 are released, and the program counter 10 and the pipeline register 15 are restarted.
Here, the fixed cycle is one cycle in the vector pipeline instruction at the address N. As described above, the program control circuit 1 deciphers that the current vector pipeline instruction at address N is a memory read, and is a three-stage pipeline process of ALU operation and accumulation, and uses this information. Thus, the control signal can be released after a certain cycle, that is, one cycle, after receiving the first end signal. The reason why one cycle is set instead of two cycles is that even if the control signal is released, the control signal is released earlier by one cycle because the instruction at address N is executed by one cycle due to the pipeline configuration of the program control circuit. It is a thing.

【００５８】前記プログラムカウンタ１０およびパイプ
ラインレジスタ１５が再起動した後、Ｎ番地のベクトル
・パイプライン命令にかかわるすべてのパイプラインサ
イクルが終了し、Ｎ番地のベクトルパイプライン命令以
降の命令、すなわちN+1,N+2,N+3番地の命令が従来のプ
ロセッサ同様逐次実行される。After the program counter 10 and the pipeline register 15 are restarted, all the pipeline cycles relating to the vector pipeline instruction at address N are completed, and the instruction after the vector pipeline instruction at address N, that is, N Instructions at addresses +1, N + 2, N + 3 are sequentially executed as in the conventional processor.

【００５９】以上、本発明の実施例によれば、ベクトル
パイプライン命令１命令により、（数２）で示したパイ
プライン並列演算が実現でき、パイプライン並列度に応
じて、性能が１０〜１００倍と飛躍的に向上する。また
プログラムメモリのステップ数も１ステップに圧縮でき
る。As described above, according to the embodiment of the present invention, the pipeline parallel operation shown in (Equation 2) can be realized by one vector pipeline instruction, and the performance is 10 to 100 depending on the pipeline parallel degree. Doubled and dramatically improved. Also, the number of steps in the program memory can be compressed to one step.

【００６０】なお、図１から図８に示した本発明の実施
例において、それぞれのアドレス発生器が、本発明者が
先に提案した２次元データの矩形領域のアドレスを発生
する特願平２ー４１４２４号（２次元アドレス発生器お
よびその制御方式）記載の２次元アドレス発生器の機能
を有するようにすれば、画像データのように２次元のデ
ータ構造を持つものに対して、非常に効率よく処理する
ことが可能になる。In the embodiment of the present invention shown in FIGS. 1 to 8, each of the address generators generates the address of the rectangular area of the two-dimensional data proposed by the present inventor. -By providing the function of the two-dimensional address generator described in No. 41424 (Two-dimensional address generator and its control method), it is very efficient for the one having a two-dimensional data structure such as image data. It becomes possible to process well.

【００６１】[0061]

【発明の効果】以上述べたように、本発明によれば、従
来の単一動作を指示する命令セットに、さらにパイプラ
イン処理のための特定の命令を追加実装したプロセッサ
が実現できる。本発明のプロセッサは、特定の命令が読
みだされると、データメモリの内容をあらかじめ設定し
た順序で逐次読みだし、この読みだしサイクルと並列に
演算サイクルを実行させるパイプライン処理を行ない、
演算器の出力を別のデータメモリに並列に逐次書き込む
かまたは累算器で逐次累算する。そして、設定した数の
データの演算を終了すると、従来のプロセッサ同様、前
記特定の命令の次のステップの命令からまた逐次実行す
るものである。As described above, according to the present invention, it is possible to realize a processor in which a specific instruction for pipeline processing is additionally mounted on the conventional instruction set for instructing a single operation. When a specific instruction is read, the processor of the present invention sequentially reads the contents of the data memory in a preset order, and performs pipeline processing for executing an operation cycle in parallel with the read cycle,
The output of the arithmetic unit is sequentially written in parallel to another data memory or is sequentially accumulated by an accumulator. Then, when the calculation of the set number of data is completed, the instruction is executed sequentially from the instruction of the step next to the specific instruction as in the conventional processor.

【００６２】信号処理において頻出する（数１）または
（数２）で示される演算を行なう時に、本発明のプロセ
ッサの特定の命令と、従来のプロセッサの命令セットを
用いて実行する場合とを比較すると、演算機能にも依存
するが、およそ１０〜１００倍の処理サイクルの高速化
が達成できる。また、プログラムステップも同時に扱う
データ量にも依存するが、多大の容量を必要とし、プロ
グラムメモリの大量消費となっていたものを１命令すな
わち１ステップで実現してしまうものである。When performing the operation represented by (Equation 1) or (Equation 2) that frequently appears in signal processing, a comparison is made between the specific instruction of the processor of the present invention and the case of executing it using the instruction set of the conventional processor. Then, although depending on the arithmetic function, the speedup of the processing cycle of about 10 to 100 times can be achieved. Further, although the program step also depends on the amount of data to be handled at the same time, a large capacity is required and a large amount of program memory is consumed by one instruction, that is, one step.

[Brief description of drawings]

【図１】本発明のプロセッサの概念を示すブロック図FIG. 1 is a block diagram showing a concept of a processor of the present invention.

【図２】本発明のプロセッサの動作を説明するタイミン
グ図FIG. 2 is a timing diagram illustrating the operation of the processor of the present invention.

【図３】本発明のプロセッサのプログラム制御回路を示
すブロック図FIG. 3 is a block diagram showing a program control circuit of the processor of the present invention.

【図４】本発明のプロセッサの実行ユニットを示すブロ
ック図FIG. 4 is a block diagram showing an execution unit of a processor of the present invention.

【図５】本発明のプロセッサにおけるベクトル命令制御
回路を示すブロック図FIG. 5 is a block diagram showing a vector instruction control circuit in the processor of the present invention.

【図６】本発明のプロセッサにおけるアドレス発生器を
示すブロック図FIG. 6 is a block diagram showing an address generator in the processor of the present invention.

【図７】本発明のプロセッサの動作を説明するタイミン
グ図FIG. 7 is a timing diagram illustrating the operation of the processor of the present invention.

【図８】本発明のプロセッサの他の動作を説明するタイ
ミング図FIG. 8 is a timing diagram illustrating another operation of the processor of the present invention.

[Explanation of symbols]

１プログラム制御回路２アドレス発生器３データメモリ４データ処理回路５実行ユニット１０プログラムカウンタ１１プログラムメモリ１２デコーダ１３ベクトル命令制御回路 1 Program Control Circuit 2 Address Generator 3 Data Memory 4 Data Processing Circuit 5 Execution Unit 10 Program Counter 11 Program Memory 12 Decoder 13 Vector Instruction Control Circuit

───────────────────────────────────────────────────── フロントページの続き (72)発明者大谷昭彦大阪府門真市大字門真1006番地松下電器産業株式会社内 (72)発明者児玉久大阪府門真市大字門真1006番地松下電器産業株式会社内 (72)発明者岡本潔大阪府門真市大字門真1006番地松下電器産業株式会社内 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Akihiko Otani 1006 Kadoma, Kadoma City, Osaka Prefecture Matsushita Electric Industrial Co., Ltd. (72) Hisamu Kodama 1006 Kadoma, Kadoma City, Osaka Matsushita Electric Industrial Co., Ltd. 72) Inventor Kiyoshi Okamoto 1006 Kadoma, Kadoma City, Osaka Prefecture Matsushita Electric Industrial Co., Ltd.

Claims

[Claims]

1. A program controlled processor comprising a data processing circuit for implementing a plurality of instructions including a vector pipeline instruction and executing a pipeline operation based on the vector pipeline instruction. And a program counter and a decoder, and when the vector pipeline instruction is read from the program memory and then decoded by the decoder, the program counter is stopped and an activation signal is output to output the vector pipe. A program control circuit that controls the operation of the data processing circuit according to the contents of the line command, and an address is continuously generated according to a preset sequence based on the activation signal to generate a preset number of addresses. When finished, give the end signal An address generator for outputting to a RAM control circuit; and a data memory for outputting data stored in advance at the generated address based on the address generated by the address generator, the data processing circuit, The pipeline operation is executed in accordance with the control of the program control circuit based on the data output from the data memory, and the program control circuit outputs the vector pipeline instruction after a predetermined cycle from the time when the end signal is received. A program-controlled processor which detects the end of the pipeline operation based on the instruction and sequentially executes the instructions following the vector pipeline instruction.

2. The program control circuit further includes a branch address control circuit for generating a branch address based on the decoding result of the decoder, and each of the program control circuits in the program control circuit based on the decoding result of the decoder and the end signal. A vector instruction control circuit for controlling the operation of the circuit; an instruction register circuit for temporarily storing data output from the program memory based on a second control signal output from the vector instruction control circuit; And a first pipeline register for temporarily storing the data output from the data memory and the data processing circuit, and the program counter temporarily stores the address output to the program memory. And the program counter register to be stored in An incrementer that outputs the address by incrementing the address output by 1 by 1 for each cycle of the operation clock, and an incrementer from the branch address control circuit based on a first control signal output from the vector instruction control circuit. A first multiplexer for selecting one of an output, an output of the program counter register and an output of the incrementer and outputting the selected address to the program memory via the program counter register, The instruction register circuit includes a second pipeline register for temporarily storing data output to the decoder, an output of the program memory based on a second control signal output from the vector instruction control circuit, and the output of the program memory. Select one of the outputs of the second pipeline register to select the second A second multiplexer for outputting to the decoder via the pipeline register of the vector instruction control circuit, the vector instruction control circuit, when the vector pipeline instruction is decoded by the decoder, the activation signal to the address generator. And the first multiplexer selects and outputs the output of the program counter register so that the program counter register self-holds data, and the second multiplexer controls the second multiplexer to output the data. The output of the pipeline register is selected and output to control the second pipeline register so as to hold data by itself, so that each instruction of the vector pipeline instruction is executed continuously. Control, and then when the end signal is received from the address generator. After a predetermined cycle, when the end of the pipeline operation based on the vector pipeline instruction is detected, the control of the first and second multiplexers is released, whereby the self-holding of the program counter register is performed. And releasing the self-holding of the second pipeline register,
2. The program-controlled processor according to claim 1, wherein the instructions following the vector pipeline instructions are sequentially executed.

3. The address generator comprises a source memory address generator and a destination memory address generator, and the vector instruction control circuit outputs the vector pipeline instruction decoded by the decoder. Based on the vector command signal
The start signal of the source memory address generator is set and output to the source memory address generator, and the start signal of the source memory address generator is generated based on the end signal output from the address generator. A first start-up signal generator for resetting, a first shift register having a predetermined number of stages of delay circuits for delaying and outputting the vector instruction signal, and the decoder for decoding the vector pipeline instruction. Selecting one of the outputs of the delay circuits of the first shift register based on the resulting vector instruction type signal and outputting the selected signal to the destination memory address generator. A second start signal generating section including a multiplexer of 3 and a delay circuit having a predetermined number of stages and delaying the end signal. A second shift register that outputs the signal and one of the outputs of the delay circuits of the second shift register based on the vector instruction type signal, and selects the selected signal as a delay end signal. Based on a fourth multiplexer for outputting, the second control signal is set based on the vector command signal and output to the second multiplexer, and based on the delay end signal output from the fourth multiplexer. A first control signal generation unit including a signal generator that resets the second control signal; the second control signal; and an address branch control signal that is a result of decoding the address branch instruction by the decoder. And a second control signal generating section for generating the first control signal and outputting the first control signal to the first multiplexer. Motomeko 2 program controlled processor according.

4. The address generator counts the number of addresses generated by the address calculator, which continuously generates addresses based on the start signal and outputs the addresses to the data memory. 3. A cycle counter, and an end determination circuit that outputs the end signal to the program control circuit when the number of addresses counted by the cycle counter reaches a predetermined value. Alternatively, the program-controlled processor according to item 3. A processor characterized by being controlled as follows.