JPH0142019B2

JPH0142019B2 -

Info

Publication number: JPH0142019B2
Application number: JP5548183A
Authority: JP
Inventors: Koichiro Hotsuta; Yukio Kamya; Masaaki Takiuchi; Toshihiro Hirabayashi; Masaki Aoki
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1983-03-31
Filing date: 1983-03-31
Publication date: 1989-09-08
Also published as: JPS59180668A

Description

【発明の詳細な説明】〔発明の技術分野〕本発明は、複数の並列演算部を備えてベクトル
命令を処理するベクトル処理プロセツサに係り、
条件付命令の実行に際し、各エレメントに対して
命令を実行するか否かを示す情報として与えられ
たマスク情報のベクトル長単位のオンの数を基
に、マスク付命令で実行するか、データのベクト
ル長短縮処理をして実行するか、或いはスキツプ
するかを選択するようにした条件付命令の実行時
命令選択方式に関するものである。[Detailed Description of the Invention] [Technical Field of the Invention] The present invention relates to a vector processing processor that includes a plurality of parallel operation units and processes vector instructions.
When executing a conditional instruction, based on the number of ons in the vector length unit of the mask information given as information indicating whether to execute the instruction for each element, whether to execute it with a masked instruction or This invention relates to a method for selecting an instruction at the time of execution of a conditional instruction in which it is possible to select whether to perform vector length shortening processing or skip the instruction.

[Conventional background and problems]

第１図はベクトル処理プロセツサを有する処理
システムの１例を示す図、第２図はベクトル命令
に対応した処理を概念的に説明する図、第３図は
ベクトル処理プロセツサに対して、与えられたソ
ース・プログラムから目的プログラムを生成して
供給するコンパイラの構成例を示す図、第４図は
ソース・プログラムを中間テキストに移してゆく
態様を説明する図、第５図はソース・プログラム
をベクトル化してゆく態様を説明する図、第６図
ないし第８図はIF文を含むソース・プログラム
に対して文マスクやパス・マスクを用意して並列
実行可能な形にしてゆく態様を説明する図であ
る。 FIG. 1 is a diagram showing an example of a processing system having a vector processing processor, FIG. 2 is a diagram conceptually explaining processing corresponding to vector instructions, and FIG. A diagram showing an example of the configuration of a compiler that generates and supplies a target program from a source program, Figure 4 is a diagram explaining how a source program is transferred to intermediate text, and Figure 5 is a diagram showing how a source program is vectorized. Figures 6 to 8 are diagrams explaining how to prepare a statement mask and path mask for a source program containing an IF statement so that it can be executed in parallel. be.

例えば、第１図Ａに示す如く、ベクトルＡに属
するエレメントa₁，a₂，……とベクトルＢに属す
るエレメントb₁，b₂，……との各エレメント相互
を加算して、エレメントc₁，c₂，……をもつベク
トルＣを生成するような、ベクトル命令を実行す
るベクトル処理プロセツサが存在している。第１
図Ａ図示の場合、第ｉ番目のエレメント相互の加
算を行なうか否かをマスク・エレメントm₁，
m₂，……にて指示するようにされており、第１
図Ｂに一般化して示す如き処理が行なわれる。 For example, as shown in FIG. 1A, elements a ₁ , a ₂ , . . . belonging to vector A and elements b ₁ , b ₂ , . . . belonging to vector B are added together to form element c _{1 .} , c ₂ , . . . , c 2 , . . . 1st
In the case shown in Figure A, mask element m ₁ ,
m ₂ , ..., and the first
Processing as generalized in FIG. B is performed.

上記の如き処理を行なうベクトル処理プロセツ
サを有するデータ処理システムは、一実施例とし
て第２図図示の如きシステム構成をもつている。
図中の符号１は主記憶装置、２はメモリ制御装
置、３はベクトル処理プロセツサ、４はチヤネ
ル・プロセツサ、５は大記憶装置、６はスカラ処
理回路部、７はベクトル処理回路部、８−０、８
−１、…は夫々浮動小数点データ・レジスタ、９
−０、９−１、…は夫々複数個のデータ（エレメ
ント・データ）を格納し得るベクトル・レジス
タ、１０−０、１０−１、…は夫々複数個のマス
ク・データ（マスク・エレメント・データ）を格
納し得るマスク・レジスタ、１１はベクトル長レ
ジスタであつて各ベクトル・レジスタに格納され
るエレメントの個数情報がセツトされるもの、１
２−０、１２−１は夫々メモリ・アクセス・パイ
プライン、１３は加減算パイプライン、１４は乗
算処理パイプライン、１５は除算処理パイプライ
ン、１６はマスク処理パイプラインを表わしてい
る。 A data processing system having a vector processing processor that performs the above processing has a system configuration as shown in FIG. 2 as an embodiment.
In the figure, reference numeral 1 is a main storage device, 2 is a memory control device, 3 is a vector processing processor, 4 is a channel processor, 5 is a large storage device, 6 is a scalar processing circuit section, 7 is a vector processing circuit section, 8- 0,8
-1, ... are floating point data registers, 9
-0, 9-1, ... are vector registers that can each store a plurality of data (element data), and 10-0, 10-1, ... are vector registers that can each store a plurality of mask data (mask element data). ); 11 is a vector length register in which information on the number of elements to be stored in each vector register is set; 1;
Reference numerals 2-0 and 12-1 represent memory access pipelines, 13 an addition/subtraction pipeline, 14 a multiplication processing pipeline, 15 a division processing pipeline, and 16 a mask processing pipeline.

上記の如きベクトル処理プロセツサが処理を実
行するに当つて、当該プロセツサが実行するに適
した形に、与えられたソース・プログラムをコン
パイルし目的プロセツサを生成することが行なわ
れる。当該コンパイルを行なうコンパイラの構成
を示したのが第３図である。 When a vector processing processor as described above executes a process, a given source program is compiled into a form suitable for execution by the processor to generate a target processor. FIG. 3 shows the configuration of a compiler that performs the compilation.

第３図において、１７は大記憶装置に格納され
ているソース・プログラム、１８はコンパイラ、
１９はコンパイルされて大記憶装置上に格納され
る目的プログラム、２０はソース解釈部、２１は
記憶域割付け部、２２はベクトル化部、２３は中
間テキスト最適化部、２４はレジスタ使用決定
部、２５は目的プログラム出力部を表わしてい
る。 In FIG. 3, 17 is a source program stored in a large storage device, 18 is a compiler,
19 is an object program compiled and stored on a large storage device; 20 is a source interpretation section; 21 is a storage allocation section; 22 is a vectorization section; 23 is an intermediate text optimization section; 24 is a register usage determination section; 25 represents a target program output section.

コンパイラ１８は、大記憶装置からソース・プ
ログラム１７を取込んで、所望の目的プログラム
１９を生成する。このとき図示の各部は次のよう
な処理を行う。 A compiler 18 takes in a source program 17 from a large storage device and generates a desired target program 19. At this time, each of the illustrated units performs the following processing.

即ち、ソース解釈部２０はソース・プログラム
１７を大記憶装置から取込み、文解釈を行つて中
間テキストに展開する。例えばソース・プログラ
ムが第４図図示左側の如き場合に図示右側に示す
如き中間テキストに展開する。記憶域割付け部２
１はプログラム内に出現する各種データに対応し
て記憶域内番地を割当てる。ベクトル化部２２
は、プログラム中のループ構造を検出し、並列実
行可能部分を認識し、第５図図示の如く中間テキ
スト変更を行う。中間テキスト最適化部２３は、
中間テキストのレベルで、第２図図示の如きベク
トル処理プロセツサを有効に利用するための最適
化を施す。レジスタ使用決定部２４は、中間テキ
ストに現われたデータに対して、ベクトル処理プ
ロセツサ上の資源（レジスタ）を割当てる。そし
て目的プログラム出力部２５は機械命令語を大記
憶装置へ出力しかつ命令語レベルでの最適化を行
う。 That is, the source interpreter 20 takes in the source program 17 from the large storage device, performs sentence interpretation, and develops it into intermediate text. For example, if the source program is as shown on the left side of FIG. 4, it is developed into intermediate text as shown on the right side of the figure. Storage area allocation part 2
1 allocates addresses within the storage area corresponding to various data appearing within the program. Vectorization unit 22
detects loop structures in the program, recognizes parallel executable parts, and changes intermediate text as shown in FIG. The intermediate text optimization unit 23
At the intermediate text level, optimization is performed to effectively utilize a vector processing processor as shown in FIG. The register use determining unit 24 allocates resources (registers) on the vector processing processor to data appearing in the intermediate text. Then, the target program output unit 25 outputs the machine instruction words to the large storage device and performs optimization at the instruction word level.

ベクトル処理プロセツサを稼動させるためのコ
ンパイラは第３図図示の如き構成をもつており、
ループ構成内にIF文をもたないプログラムにつ
いては第５図に概念的に示した如く並列実行可能
な形にして処理してゆくことが可能である。しか
し第６図に示す如きソース・プログラムが与えら
れると、当該ループ構成の中に「IF（Ａ(I)．GT.
B(I)）GO TO 50」などのIF文が存在することか
ら、従来このようなループ構成については一般に
は並列実行可能な状態にないものとして処理され
ていた。しかし、当該プログラムの場合、IF文
の飛び先がループ内に止まつており、プログラム
中の個々の文について個々の処理に当つて当該文
を実行するか否かを文マスクmiにて指示する手
法を採用することによつて並列実行可能となるこ
とが判つた。その文マスクmiを与えるコンパイ
ラ処理方式は、既に本願と同一の出願人により先
行発明（特願昭57−31198号）として提案されて
いる。以下にその概要を第６図ないし第８図を参
照しつつ説明する。 The compiler for operating the vector processing processor has the configuration shown in Figure 3.
A program that does not have an IF statement in its loop structure can be processed in a parallel executable format as conceptually shown in FIG. However, when a source program as shown in FIG. 6 is given, the loop structure contains "IF(A(I).GT.
B(I)) GO TO 50'', such loop configurations have traditionally been treated as not being able to be executed in parallel. However, in the case of this program, the destination of the IF statement is stopped within the loop, and the method uses statement mask mi to instruct whether or not to execute each statement in the program for each individual process. It was found that parallel execution was possible by adopting . A compiler processing method for providing the sentence mask mi has already been proposed as a prior invention (Japanese Patent Application No. 57-31198) by the same applicant as the present application. The outline thereof will be explained below with reference to FIGS. 6 to 8.

第６図図示のプログラムは、概略次の如き処理
を指示している。即ち、Ｉの値が「１」から
「Ｎ」になるまで繰返して文₁₀ないし文₇₀を実行
することを指示し、その間に文₂₀によつて或るＩ
の値のときＡ(I)がＢ(I)よりも大となるならば文₅₀
に飛び、また文₄₀によつて或るＩの値のときＢ(I)
がＹよりも大となるならば文₆₀に飛ぶことを指示
している。上記文マスクは、例えば文₃₀に対応し
て「Ａ(I)．GT.B(I)」なる条件以外のとき、当該
文₃₀を実行するよう指示すれば、IF文が第８図図
示の如く消減する形となる。 The program shown in FIG. 6 generally instructs the following processing. That is, it instructs to repeatedly execute statements ₁₀ to ₇₀ until the value of I changes from "1" to "N", and during that time, a certain I is executed by statement ₂₀ .
If A(I) is greater than B(I) for the value of , then statement ₅₀
, and according to statement ₄₀ , for a certain value of I, B(I)
If is greater than Y, it instructs to jump to statement ₆₀ . For example, if the above statement mask corresponds to statement ₃₀ and instructs to execute the statement ₃₀ when the condition is other than "A(I).GT.B(I)", the IF statement will be It becomes a form that disappears like this.

第７図は、第６図図示のプログラムを構成する
各文₁₀ないし文₇₀に対して、どのような文マスク
m₁₀ないしm₇₀を与えるかを説明する説明図を示
している。 Figure 7 shows what kind of statement masks are applied to each of the statements ₁₀ to ₇₀ that make up the program shown in Figure 6.
An explanatory diagram explaining how to give m ₁₀ to m ₇₀ is shown.

文₁₀の場合にはＩの値の如何に拘らず、すべて
のＩについて実行する必要がある。このことから
文マスクm₁₀としてφ（空）となる。文₂₀の場合も
m₂₀としてφとなる。文₂₀から文₅₀へ飛ぶルート
は文₂₀の条件「Ａ(I)．GT.B(I)」のときにとられ
るもので、当該パスに対してバス・マスクP₂₀，₅₀
として P₂₀，₅₀＝Ａ(I)．GT.B(I) が与えられ、一方同様にパス・マスクP₂₀，₃₀とし
て P₂₀，₃₀＝₂₀，₅₀ が与えられる。この結果から文₃₀に対応する文マ
スクm₃₀は m₃₀＝₂₀，₅₀ となる。 In the case of statement ₁₀ , it is necessary to execute for all I, regardless of the value of I. From this, the sentence mask m ₁₀ is φ (empty). Also for sentence ₂₀
As m ₂₀ , it becomes φ. The route from statement ₂₀ to statement ₅₀ is taken when the condition "A(I).GT.B(I)" of statement ₂₀ is met, and the bus mask P ₂₀ , ₅₀ is used for the path.
As P ₂₀ , ₅₀ = A(I). GT.B(I) is given, and similarly, P ₂₀ , ₃₀ = ₂₀ , ₅₀ is given as the path mask P ₂₀ , ₃₀ . From this result, the sentence mask m ₃₀ corresponding to sentence ₃₀ is m ₃₀ = ₂₀ , ₅₀ .

文₄₀についての文マスクm₄₀はm₃₀と同じもの
となる。同じようにして、パス・マスクP₄₀，₆₀や
P₄₀，₅₀が図示の如く与えられ、文マスクm₅₀は、
パス・マスクP₂₀，₅₀とP₄₀，₅₀との論理和をとつた
ものとなる。そして文マスクm₆₀，m₇₀はφとな
る。 The sentence mask m ₄₀ for sentence ₄₀ is the same as m ₃₀ . In the same way, pass masks P ₄₀ , ₆₀ and
P ₄₀ and ₅₀ are given as shown, and the sentence mask m ₅₀ is
This is the logical sum of the path masks P ₂₀ , ₅₀ and P ₄₀ , ₅₀ . And the sentence masks m ₆₀ and m ₇₀ become φ.

このような文マスクmiを与えると、第６図図
示のプログラムは、第８図図示プログラムの如
く、IF文が存在しない形となつて、並列実行可
能な形となる。なお、第８図において、「：」を
付して示した所の「：M₂」、「：M₅」は対応する
文についての文マスクであると考えてよい。 When such a statement mask mi is given, the program shown in FIG. 6 becomes a form that does not include an IF statement and can be executed in parallel, like the program shown in FIG. 8. In FIG. 8, ":M ₂ " and ":M ₅ " shown with ":" may be considered to be sentence masks for the corresponding sentences.

このように一般には、(i)IF文を含むループ内
の各文に対応して文マスクを例えばｉ番目の文に
対して文マスクmiとして与え、かつ(ii)当該第ｉ
番目の文がIF文でない場合に第（ｉ＋１）番目
の文に向うパスに対しパス・マスクPi，ｉ＋１と
し値miを与え、(iii)当該第ｉ番目の文がIF文であ
る場合に当該IF文の条件をCiとするとき条件成
立によつて飛ぶ所の飛び先の第ｋ番目の文へのパ
スに対してパス・マスクPikとして mi・AND.Ci を与え、(iv)当該第ｉ番目の文がIF文である場合
に当該IF文の条件Ciの条件不成立によつて飛ぶ
所の飛び先の第ｋ番目の文へのパスに対してパ
ス・マスクRikとして mi・AND・を与え、（）上記文マスクm_iを与えるに当つ
て、上記パス・マスクを調べて第ｉ番目の文へ至
るパスに対応するパス・マスクPliについての論
理和をとる、即ち m_i＝Ｕｌ Pli（ｌ≠ｉ）で与えるようにし、IF文を含んでいる場合でも
並列実行可能な範囲を増大せしめるようにコンパ
イルする。 In this way, in general, (i) a statement mask is given to the i-th statement as the statement mask mi corresponding to each statement in the loop including the IF statement, and (ii) the i-th statement is
If the ith statement is not an IF statement, give the path mask Pi,i+1 and the value mi to the path toward the (i+1)th statement; (iii) If the ith statement is an IF statement, then When the condition of the IF statement is Ci, give mi・AND.Ci as the path mask Pik to the path to the kth sentence that is the destination to jump to when the condition is met, and (iv) When the th sentence is an IF statement, mi・AND・ is given as a path mask Rik for the path to the kth sentence that is the jump destination when the condition Ci of the IF sentence is not met. , () In giving the above sentence mask m _i , examine the above path mask and perform a logical OR on the path mask Pli corresponding to the path leading to the i-th sentence, that is, m _i = U l Pli (l≠i), and compile in such a way that the range of parallel execution is increased even when an IF statement is included.

このようなマスク情報による条件付命令の実行
は、マスクの状態がどうであれ、ペクトル長分の
時間がかかつてしまう。ところが、マスクのオン
の数が非常に少ない命令、或いは全てのオフのマ
スクを持つ命令がある程度は現われるため、その
分の実行時間に無駄が生じるという問題がある。 Execution of a conditional instruction based on such mask information takes time equal to the length of the spectrum, regardless of the state of the mask. However, there is a problem that some instructions have a very small number of masks turned on, or instructions with all masks turned off, resulting in wasted execution time.

[Purpose of the invention]

本発明は、上記の考察に基づくものであつて、
マスクのオンの数を認識することによりオンの数
が少ない命令の実行時間を短縮することができる
条件付命令の実行時選択方式を提供することを目
的とするものである。 The present invention is based on the above considerations, and includes:
It is an object of the present invention to provide a conditional instruction execution time selection method that can reduce the execution time of instructions with a small number of ons by recognizing the number of ons in a mask.

[Structure of the invention]

条件付命令の実行時選択方式は、複数の並列演
算部を備えてベクトル命令を処理するベクトル処
理プロセツサを有するデータ処理システムにおい
て、上記ベクトル処理プロセツサは、条件付命令
の実行に際し、各エレメントに対して命令を実行
するか否かを示すマスク情報とベクトル長単位で
マスクのオンの数とがセツトされると、上記マス
クのオンの数を調べ、マスクのオンの数が０であ
ることを条件に当該命令の実行をスキツプし、マ
スクのオンの数が所定の閾値より小さいことを条
件にベクトル長短縮処理命令を使つてマスクがオ
ンの部分のみを取出して新しいベクトル・データ
を作つてベクトル演算を行い、しかる後当該ベク
トル演算の結果をマスクがオンの部分に取り込む
ようにし、マスクのオンの数が所定の閾値より大
きいことを条件にマスク付命令を使つてマスクが
オンの要素の演算を行うように構成されたことを
特徴とするものである。 The execution-time selection method for conditional instructions is used in a data processing system that has a vector processing processor that is equipped with multiple parallel processing units and processes vector instructions. Once the mask information indicating whether to execute the command or not and the number of mask ons in vector length units are set, the number of mask ons is checked and the condition is set that the number of mask ons is 0. Then, on the condition that the number of on masks is smaller than a predetermined threshold, the execution of the relevant instruction is skipped, and on the condition that the number of mask ons is smaller than a predetermined threshold, a vector length shortening processing instruction is used to extract only the part where the mask is on, create new vector data, and perform vector operations. Then, the result of the vector operation is taken into the part where the mask is on, and the operation on the element where the mask is on is performed using a masked instruction on the condition that the number of mask ons is greater than a predetermined threshold. It is characterized in that it is configured to perform.

[Embodiments of the invention]

以下、本発明の実施例を図面を参照しつつ説明
する。 Embodiments of the present invention will be described below with reference to the drawings.

第９図は本発明の１実施例構成を示す図、第１
０図は本発明に使用される条件付命令の実行方式
を説明する図、第１１図は第１０図に示す条件付
命令の実行方式に対応するベクトル・テキストを
示す図、第１２図は本発明が適用されるマスク・
レジスタの設定例を示す図、第１３図は本発明に
よる命令実行時の処理の流れを説明する図であ
る。図において、２６は命令バツフア・レジス
タ、２７はデコーダ、２８はマスク・レジスタ
（MR；Mask Register）、２９は一致回路、３０
は比較回路、３１は閾値設定部
（THRESHOLD）、３２と３３はインバータ、３
４はブランチ処理部、３５はマスク処理への分岐
処理部、３６は圧縮／拡散処理への分岐処理部を
示す。 FIG. 9 is a diagram showing the configuration of one embodiment of the present invention.
0 is a diagram explaining the conditional instruction execution method used in the present invention, FIG. 11 is a diagram showing vector text corresponding to the conditional instruction execution method shown in FIG. 10, and FIG. Masks to which the invention is applied
FIG. 13, which is a diagram showing an example of register settings, is a diagram illustrating the flow of processing when executing an instruction according to the present invention. In the figure, 26 is an instruction buffer register, 27 is a decoder, 28 is a mask register (MR), 29 is a matching circuit, and 30
3 is a comparison circuit, 31 is a threshold setting unit (THRESHOLD), 32 and 33 are inverters, 3
4 is a branch processing section, 35 is a branch processing section for mask processing, and 36 is a branch processing section for compression/diffusion processing.

第９図において、マスク・レジスタ２８は、ベ
クトル長毎のマスク情報と共にマスクのオンの数
を示す情報（MSR；Mask Status Register）が
書き込まれるものであり、命令バツフア・レジス
タ２６に格納された命令をデコーダ２７によりデ
コードされたマスク定義命令に基づいてセツトさ
れる。一致回路２９は、デコーダ２７からの実行
命令に従つて、マスク・レジスタ２８のMSRの
内容が０か否かを調べるものであり、０（Yes）
の場合にはブランチ処理部３４を作動させ、０で
ない場合にはインバータ３２を通して比較回路３
０を動作させる。ブランチ処理部３４は、命令の
実行は行わないスキツプ処理を行うものである。
即ち、ここでは、条件付命令の実行時に、MSR
の内容が０（全てオフのマスクを持つ命令）の場
合には命令の実行は行わないようにするものであ
る。比較回路３０は、マスク・レジスタ２８の
MSRの内容が閾値以上であるかを調べるもので
あり、MSRの内容が閾値以上である場合にはマ
スク処理への分岐処理部３５を動作させ、そうで
ない場合には圧縮／拡散処理への分岐処理部３６
を動作させる。マスク処理への分岐処理部３５
は、マスク付演算方式を採用した処理を行うよう
にするものであり、圧縮／拡散処理への分岐処理
部３６は、圧縮／拡散方式を採用した処理を行う
ようにするものである。次にこれらの方式につい
て説明する。 In FIG. 9, the mask register 28 is written with mask information for each vector length as well as information indicating the number of mask ONs (MSR; Mask Status Register), and the mask register 28 is used to store the instructions stored in the instruction buffer register 26. is set based on the mask definition command decoded by the decoder 27. The matching circuit 29 checks whether the contents of the MSR in the mask register 28 are 0 or not in accordance with the execution command from the decoder 27, and returns 0 (Yes).
In the case of 0, the branch processing section 34 is operated, and when it is not 0, the comparator circuit 3 is operated through the inverter 32.
Operate 0. The branch processing unit 34 performs skip processing without executing instructions.
That is, here, when the conditional instruction is executed, the MSR
If the content of is 0 (an instruction with a mask that is all off), the instruction is not executed. Comparison circuit 30 compares mask register 28 with
This is to check whether the content of the MSR is greater than or equal to the threshold value, and if the content of the MSR is greater than or equal to the threshold value, the branch processing unit 35 to mask processing is operated, and if not, it is branched to compression/diffusion processing. Processing section 36
make it work. Branch processing unit 35 to mask processing
is configured to perform processing that employs a masked calculation method, and branch processing unit 36 to compression/diffusion processing is configured to perform processing that employs a compression/diffusion method. Next, these methods will be explained.

ベクトル処理プロセツサには、マスク付ベクト
ル演算の他に、IF文のような条件文を処理する
のに適した命令として、ベクトル長短縮処理命令
がある。本発明はこのベクトル長短縮処理命令を
使つて実現するものであるが、以下では代表して
その１つ圧縮／拡散命令を使つた例で説明する。
マスク付命令（条件付命令）では、例えば、 VT１←VT２ OP VT３；mt （mtは条件を示すマスク情報）とある場合に、mtの内容（オン又はオフ）によ
つて、その要素の演算を実行するかしないかを決
定する。しかし実行しない要素に対しても実行し
た時と同じ時間がかかる。一方、圧縮／拡散命令
では、例えば、 VT２′comp ←―――― VT２：mt VT３′comp ←―――― VT３：mt VT１′←――――VT２′ OP VT３′ VT１exp ←――― VT１′：mt comp ←―――― はベクトル・データの圧縮 exp ←――― はベクトル・データの拡散を示す。 In addition to masked vector operations, vector processing processors have vector length shortening instructions that are suitable for processing conditional statements such as IF statements. The present invention is implemented using this vector length shortening processing instruction, and below, an example using one of them, a compression/spreading instruction, will be explained as a representative example.
In a masked instruction (conditional instruction), for example, in the case of VT1←VT2 OP VT3; mt (mt is mask information indicating a condition), the operation on that element is performed depending on the contents of mt (on or off). Decide whether to proceed or not. However, it takes the same amount of time to execute elements that are not executed. On the other hand, for compression/spreading instructions, for example, VT2'comp ←---- VT2:mt VT3'comp ←---- VT3:mt VT1'←----VT2' OP VT3' VT1exp ←---- VT1 ′:mt comp ←―――― indicates the compression of vector data, and exp ←―――― indicates the spread of vector data.

とある場合に、圧縮は、mtの内容がオンの部分
のみを取り出して新しいベクトル・データ（VT
２′，VT３′）をつくり、拡散は、mtの内容がオ
ンの部分にVT１′の内容を取り込むものである。
このときの演算は、mtのオンの数をベクトル長
として実行できる。条件付の演算、例えば、 Ai＋Bi：mi：ｉ＝１，２、……８についての操作の様子を示したのが第１０図であ
る。第１０図において、Ａがマスク付命令を使つ
たマスク付演算方式について示したものであり、
Ｂが圧縮／拡散命令を使つた圧縮／拡散方式につ
いて示したものである。これらの２方式を比較す
ると、圧縮／拡散方式は、前後の圧縮及び拡散処
理が補助操作として必要になり、その分の時間が
かかるものの、演算そのものの実行はオンの数
（true率又は真率）に応じた時間でよいため、
true率が小さい時には効果がある。他方マスク付
演算方式は、演算時間はベクトル長分かかるが、
補助操作がないのでtrue率が高い場合に有効であ
る。これらの方式を採用するベクトル・テキスト
の例を示したのが第１１図である。第１１図にお
いて、Ａがマスク付演算方式の場合の例を示した
ものであり、Ｂが圧縮／拡散方式の場合の例を示
したものである。In some cases, compression extracts only the part where the contents of mt are on and creates new vector data (VT
2', VT3'), and the diffusion involves taking in the contents of VT1' into the part where the contents of mt are on.
The calculation at this time can be performed using the number of turns on mt as the vector length. FIG. 10 shows the conditional operation, for example, Ai+Bi:mi:i=1, 2, . . . 8. In FIG. 10, A shows a masked operation method using masked instructions,
B shows a compression/spreading method using compression/spreading instructions. Comparing these two methods, the compression/diffusion method requires pre- and post-compression and diffusion processing as auxiliary operations, which takes time; however, the execution of the calculation itself is ).
It is effective when the true rate is small. On the other hand, the masked calculation method takes the calculation time equal to the vector length, but
Since there is no auxiliary operation, it is effective when the true rate is high. FIG. 11 shows an example of vector text that employs these methods. In FIG. 11, A shows an example where the masked calculation method is used, and B shows an example where the compression/spreading method is used.

本発明が適用されるマスク・レジスタの設定例
を示したのが第１２図である。第１２図におい
て、マスク・レジスタMR０ないしMR３は、ベ
クトル長６の場合の例を示し、マスクのオンの数
を示す情報MSR０ないしMSR３を持つている。
例えば、マスク・レジスタMR１の場合にはマス
ク付命令を使つて実行され、マスク・レジスタ
MR２の場合にはMSR２内容が０であるからス
キツプされ、マスク・レジスタMR３の場合に
は、MSR３の内容がベクトル長６に比べて非常
に小さいので圧縮／拡散命令を使つて実行され
る。 FIG. 12 shows an example of mask register settings to which the present invention is applied. In FIG. 12, mask registers MR0 to MR3 have information MSR0 to MSR3 indicating the number of mask ONs, showing an example of a vector length of 6.
For example, in the case of mask register MR1, it is executed using a masked instruction, and the mask register MR1 is executed using a masked instruction.
In the case of MR2, the content of MSR2 is 0, so it is skipped, and in the case of mask register MR3, the content of MSR3 is very small compared to the vector length of 6, so it is executed using a compression/spreading instruction.

次に条件付命令の実行時の処理の流れを第１３
図を参照しつつ説明する。 Next, the flow of processing when executing a conditional instruction is shown in Section 13.
This will be explained with reference to the figures.

マスク・レジスタ（MR、MSR）への情報
をセツトする。次にの処理を行う。 Set information to mask registers (MR, MSR). Perform the following processing.

MSR＝０かどうかを調べる。 Check whether MSR=0.

Yesの場合には実行をスキツプして処理終了
としNoの場合にはの処理を行う。 If Yes, the process is skipped and the process ends; if No, the process is performed.

閾値をベクトル長＊γ（γ＜１）とすると、
MSRの内容が閾値より大きいか否かを調べる。 If the threshold is vector length * γ (γ < 1),
Check whether the content of MSR is greater than the threshold.

Yes（真率が高い）の場合にはの処理を行
い、No（真率が低い）場合にはの処理を行
う。 If Yes (true rate is high), process is performed, and if No (true rate is low), process is performed.

マスク付命令を使つて実行する。 Execute using masked instructions.

圧縮／拡散命令を使つてデータの圧縮、拡散
をして実行する。 Compress and spread data using compression/spreading instructions.

なお、先に述べたように、によればベクトル
長に比例する実行時間を要するが、によれば真
率に比例した実行時間にα（圧縮、拡散の補助操
作に必要な時間）を加えた時間を要する。 As mentioned earlier, according to , the execution time is proportional to the vector length, but according to , the execution time is proportional to the true rate plus α (the time required for auxiliary operations of compression and diffusion). It takes time.

また、では、MSRへマスク・レジスタMR
中のオンのビツト数が入るが、これをマスク・レ
ジスタMRへの設定と同時に設定できないハード
ウエアの場合には、その後にマスク・レジスタ
MRのオンのビツト数を数える命令を出して、ソ
フトウエアによる代行処理を行つてもよい。 Also, mask register MR to MSR
If the hardware does not allow this to be set at the same time as the mask register MR, then the mask register MR is set.
It is also possible to issue an instruction to count the number of ON bits of MR and perform processing on behalf of the software.

〔Effect of the invention〕

以上の説明から明らかなように、本発明によれ
ば、真率の低い命令及び全てのマスクがオフの命
令について実行時間を短縮することができ、デー
タ処理システムの処理効率の向上を計ることがで
きる。 As is clear from the above description, according to the present invention, the execution time of instructions with a low true rate and instructions with all masks off can be shortened, and the processing efficiency of a data processing system can be improved. can.

[Brief explanation of drawings]

第１図はベクトル処理プロセツサを有する処理
システムの１例を示す図、第２図はペクトル命令
に対応した処理を概念的に説明する図、第３図は
ベクトル処理プロセツサに対して、与えられたソ
ース・プログラムから目的プログラムを生成して
供給するコンパイラの構成例を示す図、第４図は
ソース・プログラムを中間テキストに移してゆく
態様を説明する図、第５図はソース・プログラム
をベクトル化してゆく態様を説明する図、第６図
ないし第８図はIF文を含むソース・プログラム
に対して文マスクやパス・マスクを用意して並列
実行可能な形にしてゆく態様を説明する図、第９
図は本発明の１実施例構成を示す図、第１０図は
本発明に使用される条件付命令の実行方式を説明
する図、第１１図は第１０図に示す条件付命令の
実行方式に対応するベクトル・テキストを示す
図、第１２図は本発明が適用されるマスク・レジ
スタの設定例を示す図、第１３図は本発明による
命令実行時の処理の流れを説明する図である。１……主記憶装置、２……メモリ制御装置、３
……ベクトル処理プロセツサ、４……チヤネル・
プロセツサ、５……大記憶装置、６……スカラ処
理回路部、７……ベクトル処理回路部、８−０な
いし８−ｎ……浮動小数点データ・レジスタ、９
−０ないし９−ｎ……ベクトル・レジスタ、１０
−０ないし１０−ｎ……マスク・レジスタ、１１
……ベクトル長レジスタ、１２−０と１２−１…
…メモリ・アクセス・パイプライン、１３……加
減算パイプライン、１４……乗算処理パイプライ
ン、１５……除算処理パイプライン、１６……マ
スク処理パイプライン、１７……ソース・プログ
ラム、１８……コンパイラ、１９……目的プログ
ラム、２０……ソース解釈部、２１……記憶域割
付け部、２２……ベクトル化部、２３……中間テ
キスト最適化部、２４……レジスタ使用決定部、
２５……目的プログラム出力部、２６……命令バ
ツフア・レジスタ、２７……デコーダ、２８……
マスク・レジスタ（MR：Mask Register）、２
９……一致回路、３０……比較回路、３１……閾
値設定部（THRESHOLD）、３２と３３……イ
ンバータ、３４……ブランチ処理部、３５……マ
スク処理への分岐処理部、３６……圧縮／拡散処
理への分岐処理部。 Fig. 1 is a diagram showing an example of a processing system having a vector processing processor, Fig. 2 is a diagram conceptually explaining processing corresponding to vector instructions, and Fig. 3 is a diagram showing an example of a processing system having a vector processing processor. A diagram showing an example of the configuration of a compiler that generates and supplies a target program from a source program, Figure 4 is a diagram explaining how a source program is transferred to intermediate text, and Figure 5 is a diagram showing how a source program is vectorized. Figures 6 to 8 are diagrams explaining how to prepare a statement mask and path mask for a source program containing an IF statement so that it can be executed in parallel. 9th
The figure shows the configuration of one embodiment of the present invention, Figure 10 is a diagram explaining the conditional instruction execution method used in the present invention, and Figure 11 shows the conditional instruction execution method shown in Figure 10. FIG. 12 is a diagram showing an example of mask register settings to which the present invention is applied, and FIG. 13 is a diagram illustrating the flow of processing when executing an instruction according to the present invention. 1...Main storage device, 2...Memory control device, 3
...Vector processing processor, 4...Channel
Processor, 5...Large storage device, 6...Scalar processing circuit section, 7...Vector processing circuit section, 8-0 to 8-n...Floating point data register, 9
-0 to 9-n...vector register, 10
-0 to 10-n...mask register, 11
...Vector length registers, 12-0 and 12-1...
...Memory access pipeline, 13...Addition/subtraction pipeline, 14...Multiplication processing pipeline, 15...Division processing pipeline, 16...Mask processing pipeline, 17...Source program, 18...Compiler , 19...Objective program, 20...Source interpretation unit, 21...Storage area allocation unit, 22...Vectorization unit, 23...Intermediate text optimization unit, 24...Register usage determination unit,
25...Object program output unit, 26...Instruction buffer register, 27...Decoder, 28...
Mask register (MR: Mask Register), 2
9... Matching circuit, 30... Comparison circuit, 31... Threshold value setting section (THRESHOLD), 32 and 33... Inverter, 34... Branch processing section, 35... Branch processing section to mask processing, 36... Branch processing unit for compression/spreading processing.

Claims

[Claims]

1. In a data processing processor having a vector processing processor that is equipped with a plurality of parallel calculation units and processes vector instructions, the vector processing processor determines whether or not to execute the instruction for each element when executing a conditional instruction. When the mask information indicated and the number of mask ons in vector length units are set, the number of mask ons is checked, and on condition that the number of mask ons is 0, execution of the instruction is skipped; Under the condition that the number of on masks is smaller than a predetermined threshold, a vector length shortening processing instruction is used to extract only the part where the mask is on, create new vector data, and perform vector operations.
After that, the result of the vector operation is imported into the part where the mask is on, and on condition that the number of mask ons is greater than a predetermined threshold, the masked instruction is used to perform the operation on the element where the mask is on. An execution-time instruction selection method for conditional instructions, characterized in that: