JPH11250035A

JPH11250035A - Editing system and its method

Info

Publication number: JPH11250035A
Application number: JP4562098A
Authority: JP
Inventors: Katsuaki Fundou; 勝昭分銅
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1998-02-26
Filing date: 1998-02-26
Publication date: 1999-09-17

Abstract

PROBLEM TO BE SOLVED: To enable parallelization and vectrization even for a loop jumping out of the loop and to improve execution performance. SOLUTION: An editing device 1 is provided with a syntax analysis part 2 reading a source program, analyzing a syntax and generating an intermediate text, a parallelizing and vectorizing processing part 3 outputting the intermediate text as a parallel intermediate text for executing it in parallel, and a code generation part 4 generating and outputting a target program by using the parallelizing intermediate text. The parallelizing and vectorizing processing part 3 is provided with a structure analysis part 5 analyzing a structure based on the intermediate text, a data depending relation analysis part 6 analyzing whether a contradiction occurs in a definition reference relation on respective variables, a parallelizing and vectorizing text generation part 7 generating the paralleling intermediate text for parallelizing and vectorizing and an effect simulation part 8 predicting and comparing execution time in the case of whole loop scalar and that when parallelizing and vectorizing are executed.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は編集方式及び方法に
関し、特にベクトル演算装置を持ち、マルチプロセッサ
を有するシステムの上で実行される目的プログラムを作
成するための編集時に、ループ外への飛び出しのあるル
ープをベクトル化及び並列化を行うための編集方式及び
方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an editing method and method, and more particularly to an editing method and method for jumping out of a loop during editing for creating a target program to be executed on a system having a vector processor and having a multiprocessor. The present invention relates to an editing method and method for vectorizing and parallelizing a loop.

【０００２】[0002]

【従来の技術】従来の編集方式及び方法、例えば、特開
昭６３ー１８７３６５号公報で開示されているもので
は、ループ中のＩＦ文配下で定義される変数が、ループ
外で参照されている場合、作業用ベクトル領域及びマス
クを用いることにより、ベクトル化を可能としている。
この場合，ベクトル化阻害要因として考慮されているの
は、ループ中のＩＦ文配下で定義される変数のデータ依
存関係のみであり、終値保証のみ確保できれば、ベクト
ル化が可能となるのみを考えているものである。このた
め、変数の演算結果により、不定期にループ外への飛び
出しがあるものについては、考慮されていない。一般的
に、ループ外への飛び出しがあり、ベクトル化が行われ
るケースとして考えられるものに、任意の配列の中か
ら、特定の値を持つ要素を捜し出す検索機能があるが、
これは、通常、マクロ演算機能を使用したもので、この
方法は、準備されたケースを少しでも外れると利用する
ことができないものである。2. Description of the Related Art In a conventional editing method and method, for example, disclosed in Japanese Patent Application Laid-Open No. 63-187365, variables defined under an IF statement in a loop are referenced outside the loop. In this case, vectorization is enabled by using a work vector area and a mask.
In this case, only the data dependency of variables defined under the IF statement in the loop is considered as the vectorization hindrance factor. If only the closing price guarantee can be secured, only vectorization is possible. Is what it is. For this reason, there is no consideration for a case in which the variable randomly jumps out of the loop due to the operation result of the variable. In general, there is a search function that searches for an element having a specific value from an arbitrary array, which is considered as a case where there is a jump out of the loop and vectorization is performed.
This usually uses a macro operation function, and this method cannot be used if the prepared case is slightly deviated.

【０００３】[0003]

【発明が解決しようとする課題】上述した従来の編集方
式及び方法は、例えば、変数の演算結果により不定期に
ループ外への飛び出しがあり、且つマクロ演算機能を使
用できないループをベクトル化しようとした場合、ルー
プ長が確定できないためベクトル演算が使用できないと
いう問題点がある。又、どのタイミングでループを抜け
るかわからないためループ中で定義された変数の終値の
保証ができないという問題点もある。The above-mentioned conventional editing method and method, for example, attempt to vectorize a loop in which a macro operation function cannot be used due to irregular jumps out of the loop due to a variable operation result. In this case, there is a problem that the vector operation cannot be used because the loop length cannot be determined. Further, there is also a problem that it is not possible to guarantee the closing value of a variable defined in the loop because it is not possible to know when to exit the loop.

【０００４】本発明の目的は、ループ外への飛び出しを
持つループに対しても並列化とベクトル化とを可能と
し、実行性能を向上させることができる編集方式及び方
法を提供することにある。[0004] It is an object of the present invention to provide an editing method and method capable of improving the execution performance by enabling parallelization and vectorization even for a loop having a jump outside the loop.

【０００５】[0005]

【課題を解決するための手段】本発明の編集方式は、与
えられたループを含むソースプログラムから目的プログ
ラムを生成し出力する編集方式において、高級言語でか
かれたソースプログラムを読み込み構文解析を行って中
間テキストを生成する構文解析部と、前記中間テキスト
を並列化及びベクトル化して並列実行させるための並列
化中間テキストを作成し出力する並列化ベクトル化処理
部と、前記並列化中間テキストを用いて目的プログラム
を生成し出力を行うコード生成部とを備える構成であ
る。According to the editing method of the present invention, in an editing method for generating and outputting a target program from a source program including a given loop, a source program written in a high-level language is read and analyzed. A syntactic analysis unit that generates an intermediate text, a parallelization vectorization processing unit that creates and outputs a parallelized intermediate text for parallelizing and vectorizing the intermediate text and executing the parallel execution, and using the parallelized intermediate text. And a code generation unit that generates and outputs a target program.

【０００６】本発明の編集方式は、前記並列化ベクトル
化処理部が、中間テキストを基に構造解析を行う構造解
析部と、前記中間テキストに対し分割と並列化とベクト
ル化との少なくともいずれか１つを実施した場合に各変
数に対し定義参照関係に矛盾が生じるか否かを解析する
データ依存関係解析部と、並列化及びベクトル化用の並
列化中間テキストを生成する並列／ベクトルテキスト生
成部と、全ループスカラの場合と並列化及びベクトル化
を行った場合との実行時間の予測を行い比較する効果シ
ミュレート部とを有してもよい。[0006] In the editing method according to the present invention, the parallelization vectorization processing unit may perform a structure analysis based on the intermediate text, and at least one of division, parallelization, and vectorization of the intermediate text. A data dependency analysis unit for analyzing whether or not a definition reference relationship is inconsistent for each variable when one is implemented, and a parallel / vector text generation for generating a parallelized intermediate text for parallelization and vectorization And an effect simulating unit for estimating and comparing the execution time of the case of all loop scalars and the case of performing parallelization and vectorization.

【０００７】本発明の編集方式は、前記コード生成部が
生成する目的プログラムには、前記中間テキスト及び前
記並列化中間テキストの指定するタスクを並列に実行
し、各タスクに自タスク終了時に他タスクが終了時して
いない場合に他タスクを中断する命令を組み込んでもよ
い。According to the editing method of the present invention, a task specified by the intermediate text and the parallelized intermediate text is executed in parallel in a target program generated by the code generation unit. May be incorporated to suspend other tasks when is not completed.

【０００８】本発明の編集方法は、与えられたループを
含むソースプログラムから目的プログラムを生成し出力
する編集方法において、高級言語でかかれたソースプロ
グラムを読み込み構文解析を行って中間テキストを生成
し、この中間テキストを基に構造解析を行い、前記中間
テキストに対し分割と並列化とベクトル化との少なくと
もいずれか１つを実施した場合に各変数に対し定義参照
関係に矛盾が生じるか否かを解析し、この解析の結果が
実行可能であるならば前記ループ中のブランチ毎に処理
をブロック化し、それぞれのブロックをベクトル化した
テキスト及びマスク生成のため各ＩＦ文の条件式の算出
を含む情報収集を行うテキストを作成し、並列化及びベ
クトル化したテキストを用いることによる効果を予測
し、予測の結果で効果が見込まれる場合にはそれぞれの
テキストを並列で実行させる並列化中間テキストを作成
し、この並列化中間テキストとループ全体をスカラで実
行させる前記中間テキストとを並列に実行し、各タスク
に自タスク終了時に他タスクが終了時していない場合に
他タスクを中断する命令を組み込んだ目的プログラムを
生成するようにしている。According to the editing method of the present invention, in an editing method for generating and outputting a target program from a source program including a given loop, an intermediate text is generated by reading and parsing a source program written in a high-level language, Structural analysis is performed based on the intermediate text, and if at least one of division, parallelization, and vectorization is performed on the intermediate text, it is determined whether or not inconsistency occurs in the definition reference relationship for each variable. Analyze, and if the result of the analysis is executable, block the processing for each branch in the loop, and include information including the calculation of the conditional expression of each IF statement for generating a vectorized text and mask for each block. Create text to be collected, predict the effect of using parallelized and vectorized text, and use the Is expected, a parallelized intermediate text for executing each text in parallel is created, and the parallelized intermediate text and the intermediate text for executing the entire loop in scalar are executed in parallel. At the time of termination, a target program incorporating an instruction to suspend another task when the other task is not terminated is generated.

【０００９】［作用］本発明の効果シミュレート部は、
並列化及びべクトル化処理を行う場合、必要になる準備
及び後処理等のオーバーへッドのため、分割したにもか
かわらず、単体でスカラ処理よりも遅くなってしまうこ
とを防ぐために、最大ループ長（最後までループ外へ飛
び出さなかった場合）での実行時間をシミュレートし、
単体でスカラの場合との比較を行う。本発明は、従来の
方法とくらべ、実行性能が向上しなかった場合でも、Ｃ
ＰＵや記憶領域等の資産の無駄使いが生じる危険性を低
くするためのものである。[Operation] The effect simulating unit of the present invention comprises:
When performing parallelization and vectorization processing, in order to prevent that it becomes slower than scalar processing by itself, despite division, due to overhead of necessary preparation and post-processing, etc. Simulate the execution time at the loop length (if you did not jump out of the loop to the end)
Compare with the scalar case by itself. The present invention can improve the C performance even when the execution performance is not improved as compared with the conventional method.
This is to reduce the risk of wasteful use of assets such as PUs and storage areas.

【００１０】又、並列化及びベクトル化のコードのみな
らず、ループ全体がスカラのコードをも目的プログラム
に埋め込み、処理時間の短い方を採用するようになって
いるのは、ループ外への飛び出しが非常に早い段階で生
じ、ループ全体をスカラで実行するコードのほうが、処
理時間が短くなる場合があるためであり、これによる性
能低下を防いでいる。そのため、資産の無駄使いに対
し、目をつぶることのできるシステムであるならば、効
果シミュレート部は、必ずしも必要ではない。In addition to the code for parallelization and vectorization, a scalar code for the entire loop is embedded in the target program, and the shorter processing time is adopted. This occurs at a very early stage, and the code that executes the entire loop with a scalar may have a shorter processing time, thereby preventing performance degradation due to this. Therefore, the effect simulation unit is not always necessary if the system can close eyes on wasteful use of assets.

【００１１】[0011]

【発明の実施の形態】次に、本発明の実施の形態につい
て図面を参照して説明する。Next, embodiments of the present invention will be described with reference to the drawings.

【００１２】図１は本発明の第１の実施の形態を示すブ
ロック図である。FIG. 1 is a block diagram showing a first embodiment of the present invention.

【００１３】本発明の編集装置１は、高級言語でかかれ
たソースプログラムを読み込み、構文解析を行って中間
テキストを生成する構文解析部２と、中間テキストを並
列化及びベクトル化して並列実行させるための並列化中
間テキストを作成し出力する並列化ベクトル化処理部３
と、並列化中間テキストを用いて目的プログラムを生成
し出力を行うコード生成部４とから成る。The editing apparatus 1 according to the present invention reads a source program written in a high-level language, performs a syntax analysis, and generates an intermediate text, and a parallelizing and vectorizing the intermediate text for parallel execution. Parallelization vectorization processing unit 3 that creates and outputs parallelized intermediate text
And a code generation unit 4 that generates and outputs a target program using the parallelized intermediate text.

【００１４】又、並列化ベクトル化処理部３は、中間テ
キストを基に構造解析を行う構造解析部５と、中間テキ
ストに対し分割と並列化とベクトル化との少なくともい
ずれか１つを実施した場合に、各変数に対し、定義参照
関係に矛盾が生じるか否かを解析するデータ依存関係解
析部６と、並列化及びベクトル化用の並列化中間テキス
トを生成する並列／ベクトルテキスト生成部７と、全ル
ープスカラの場合と並列化及びベクトル化を行った場合
との実行時間の予測を行い比較する効果シミュレート部
８とから構成される。The parallelized vectorization processing unit 3 performs a structure analysis unit 5 for performing a structure analysis based on the intermediate text, and performs at least one of division, parallelization, and vectorization on the intermediate text. In this case, for each variable, a data dependency analysis unit 6 that analyzes whether or not a contradiction occurs in the definition reference relationship, and a parallel / vector text generation unit 7 that generates a parallelized intermediate text for parallelization and vectorization And an effect simulating unit 8 for predicting and comparing the execution time of the case of all loop scalars and the case of performing parallelization and vectorization.

【００１５】図２は本発明の適用が予測されるソースプ
ログラムの基本形を説明する説明図である。ブランチの
個数はいくつでも良いが、ここでは、解り易くするた
め、２つとしておく。FIG. 2 is an explanatory diagram for explaining a basic form of a source program for which application of the present invention is expected. The number of branches may be any number, but here, two is set for easy understanding.

【００１６】図３は並列化ベクトル化処理部内の処理の
流れを示す流れ図である。説明には図１の名称と符号と
を用い、図１の接続及び図３の流れに従って説明する。FIG. 3 is a flowchart showing the flow of processing in the parallelized vectorization processing section. The description will be given using the names and reference numerals in FIG. 1 and according to the connection in FIG. 1 and the flow in FIG.

【００１７】まず、編集装置１は、ソースプログラムを
読み込む。ソースプログラムを読み込むと、構文解析部
２が中間テキストを生成する。次に、ステップ（以下Ｓ
と記す）１でこの情報を並列化ベクトル化処理部３に渡
す。並列化ベクトル化処理部３では、Ｓ２で構造解析部
５が中間テキストを基に構造解析を行い、Ｓ３でデータ
依存関係解析部６がデータ依存関係解析を行い、Ｓ４で
解析結果により並列化及びベクトル化の適用が可能か否
かの判断を行う。ベクトル化の解析では、ループ外ヘの
ブランチを持つＩＦ文はベクトル化不可の文として扱
い、このことだけでループ全体へのベクトル化不可要因
とはしない。並列化が不可となった段階で、ループ全体
ヘのベクトル化不可要因とする。並列化の解析対象は処
理α、処理β、処理γ及び、それぞれの間の条件分岐等
のマスク生成に必要な部分になる（図２参照）。又、処
理α，β，γは必ずしも全べクトル化されている必要は
ない。分割が可能であるならば、部分ベクトルでも、ス
カラであっても、並列化を行わない理由にはならない。
先のＳ４での解析により並列化及びベクトル化が不可で
あった場合には、並列化ベクトル化処理部３を抜け、構
文解析部２で作った中間テキストを基に、ループ全体が
スカラのみの目的プログラムになる。並列化及びベクト
ル化が可能であった場合には、処理は並列／ベクトルテ
キスト生成部７に移り、Ｓ５で以下のものに対し並列化
ベクトル化の中間テキストを作成する。（１）並列化及びベクトル化のための準備（２）マスク作成のためのデータ収集（条件式の計算、
添字式計算．ｅｔｃ）（３）処理α （４）処理β （５）処理γ （６）ストアが必要な変数に対するマスクの作成（７）並列化及びベクトル化のための後処理。（主にマ
スク付きストア処理）このとき、（２）〜（５）が並列になるようにする。そ
のため、これらの処理中にストアされる変数に対して
は、直接その変数の記憶域を変更しないように、作業領
域にストアするようにする。実際の記憶域へのストアは
（７）の処理中で、（６）で作成されたマスクを用いて
行う。（２）は全てスカラ命令になる。First, the editing device 1 reads a source program. When the source program is read, the parsing unit 2 generates an intermediate text. Next, step (hereinafter S
This information is passed to the parallelization vectorization processing unit 3 at 1). In the parallelization vectorization processing unit 3, the structure analysis unit 5 performs the structure analysis based on the intermediate text in S2, the data dependency analysis unit 6 performs the data dependency analysis in S3, and performs parallelization and It is determined whether vectorization can be applied. In the vectorization analysis, an IF statement having a branch to the outside of the loop is treated as a statement that cannot be vectorized, and this fact alone does not constitute a non-vectorizable factor for the entire loop. At the stage where the parallelization becomes impossible, it is determined that the vectorization to the entire loop cannot be performed. The analysis target of the parallelization is a process α, a process β, a process γ, and a portion necessary for generating a mask such as a conditional branch between them (see FIG. 2). Further, the processes α, β, and γ do not necessarily have to be all vectorized. If division is possible, neither a partial vector nor a scalar is a reason not to parallelize.
If parallelization and vectorization are not possible by the analysis in the previous step S4, the process exits the parallelization vectorization processing unit 3 and, based on the intermediate text created by the syntax analysis unit 2, the entire loop is a scalar only. Become a purpose program. If parallelization and vectorization are possible, the process proceeds to the parallel / vector text generation unit 7, and in S5, an intermediate text for parallelization vectorization is created for the following. (1) Preparation for parallelization and vectorization (2) Data collection for mask creation (calculation of conditional expressions,
Subscript formula calculation. etc) (3) Processing α (4) Processing β (5) Processing γ (6) Creation of a mask for variables requiring storage (7) Post-processing for parallelization and vectorization. (Mainly store processing with a mask) At this time, (2) to (5) are arranged in parallel. Therefore, variables stored during these processes are stored in the work area so as not to directly change the storage area of the variables. The actual storage in the storage area is performed using the mask created in (6) during the processing of (7). (2) are all scalar instructions.

【００１８】図４は目的プログラムの実行状態を説明す
る説明図である。図４の説明はここでは行わないが、現
在説明中の並列化及びベクトル化の実行状態を示してい
ることのみを述べておく。内容としては、ソースプログ
ラムが、本発明の手段で目的プログラムになった場合
の、各プロセッサ１１，１２，１３，１４，１５に割り
当てられる処理の内容である。FIG. 4 is an explanatory diagram for explaining the execution state of the target program. Although the description of FIG. 4 is not made here, it is only mentioned that it shows the execution state of parallelization and vectorization currently being described. The content is the content of the process assigned to each of the processors 11, 12, 13, 14, and 15 when the source program becomes the target program by the means of the present invention.

【００１９】並列／ベクトルテキスト生成部７での処理
が終了すると、処理は効果シミュレート部８に移る。も
ともとベクトル化は、ループ長は長ければ長い程有効な
方法であるため、この場合最も効果的なのは、最後まで
ループ外へ飛び出さなかった場合である。この理想的な
状態であってさえ、処理数が少なかったり、オーバーへ
ッドが大きすぎる場合には、並列化及びベクトル化が無
意味になる場合が存在し得る。あらかじめ無駄であるこ
とがわかる場合には、並列化及びベクトル化を行わない
方がよい。そのため、効果シミュレート部８は、Ｓ６で
並列化及びベクトル化でかかるコスト（実行時間）と、
ループ全体でスカラの場合のコストとをシミュレート
し、その結果を比較する。既に説明した理由によりシミ
ュレートでは、最後までループ外へ飛び出さなかったと
仮定する。並列化及びベクトル化のコストは以下のコス
トの加算になる。（１）並列化及びベクトル化のための準備（２）以下の項目で最もコストの高いもの。以下の項目
は並列に処理されるため、同期をとる必要があり、最も
高いコストが全体のコストになる。・マスク作成のためのデータ収集（条件式の計算、添字
式計算．ｅｔｃ）・処理α ・処理β ・処理γ (３)ストアが必要な変数に対するマスクの作成 (４)並列化及びべクトル化のための後処理。なお、図４にある、プロセッサ１５の処理を中断する処
理は計算に入れる必要がない。この部分はスカラ処理の
コードにも最後に埋め込まれてる。スカラの方は、プロ
セッサ１１〜１４の処理の中断である。When the processing in the parallel / vector text generation section 7 is completed, the processing shifts to the effect simulation section 8. Since vectorization is originally effective as the loop length becomes longer, the most effective method in this case is a case in which the loop is not jumped out to the end. Even in this ideal state, if the number of processes is small or the overhead is too large, parallelization and vectorization may become meaningless. If it is found that the data is wasteful, it is better not to perform parallelization and vectorization. Therefore, the effect simulating unit 8 calculates the cost (execution time) required for parallelization and vectorization in S6,
Simulate the cost of a scalar over the entire loop and compare the results. For the reasons already explained, it is assumed that the simulation did not jump out of the loop until the end. The cost of parallelization and vectorization is the addition of the following costs. (1) Preparation for parallelization and vectorization (2) The following items with the highest cost. Since the following items are processed in parallel, it is necessary to synchronize, and the highest cost becomes the total cost. -Data collection for mask creation (conditional expression calculation, subscript expression calculation. Etc)-Processing α-Processing β-Processing γ (3) Creation of masks for variables requiring storage (4) Parallelization and vectorization For post-processing. The processing for interrupting the processing of the processor 15 shown in FIG. 4 does not need to be included in the calculation. This part is also embedded at the end of the scalar processing code. In the case of the scalar, the processing of the processors 11 to 14 is interrupted.

【００２０】Ｓ７で上記の手段によって算出された予想
コストと、スカラの予想コストを比較し、スカラの方が
コストが低ければ、並列化ベクトル化処理部３を抜け、
構文解析部２で作った中間テキストを基に、ループ全体
がスカラのみの目的ブログラムになる。スカラの方がコ
ストが高ければ本発明の手段によって作成された並列化
中間テキストを用いる。In S7, the expected cost calculated by the above means is compared with the expected cost of the scalar. If the cost of the scalar is lower, the processing exits the parallelization vectorization processing unit 3, and
On the basis of the intermediate text created by the parsing unit 2, the entire loop becomes a scalar-only target program. If the scalar is more expensive, use the parallelized intermediate text created by the means of the present invention.

【００２１】本発明を用いることで効果が期待できる場
合は、Ｓ８でこれまであった中間コードに新しい中間コ
ードを埋め込むことになるが、この時、並列化及びベク
トル化のテキストの、マスク作成直後及びループ全体が
スカラの場合のテキストの最後に、相手の処理を中断す
るテキストを追加しておく。If an effect can be expected by using the present invention, a new intermediate code is embedded in the existing intermediate code in step S8. At the end of the text when the entire loop is a scalar, text that interrupts the processing of the other party is added.

【００２２】[0022]

【実施例】次に、本発明の実施例について，図面を参照
して説明する．ここでは、本方式を用いて作成された目
的プログラムが適用されるシステム上で、どのように実
行されるかについて説明する。Next, an embodiment of the present invention will be described with reference to the drawings. Here, a description will be given of how the target program created using this method is executed on a system to which the target program is applied.

【００２３】図５は本発明の実施例に使用するソースプ
ログラムを説明する説明図、図６はソースプログラムが
目的プログラムに編集された時に、並列処理を行う各プ
ロセッサの処理内容を説明する説明図である。FIG. 5 is an explanatory diagram for explaining a source program used in the embodiment of the present invention, and FIG. 6 is an explanatory diagram for explaining processing contents of respective processors which perform parallel processing when the source program is edited into a target program. It is.

【００２４】この例では、処理α、β、γは、それぞれ
ベクトル化阻害要因を持たないため、全べクトル化が可
能である。”ｎ＝ｎ＋１”は処理γ及びマスク作成のた
めのデータ収集のどちらにも必要となるため、プロセッ
サ１１，１４のどちらでも計算が行われる。ただし、プ
ロセッサ１１はスカラ、プロセッサ１４はベクトルであ
るため、同期後のストア処理では、プロセッサ１４の結
果は使用せず、プロセッサ１１の結果をストアする。こ
れはプロセッサ１４がすべてスカラ処理であっても変わ
らない。プロセッサ１４で行われる処理は、あくまで、
ループ外へのブランチはないものとして演算されている
ためで、例えスカラ演算であったとしても、その結果を
使用するためには、プロセッサ１１の演算結果から作成
されたマスクを基にストアする必要がある。In this example, since the processes α, β, and γ do not have any vectorization inhibiting factors, all vectors can be obtained. Since “n = n + 1” is required for both processing γ and data collection for mask creation, the calculation is performed by either of the processors 11 and 14. However, since the processor 11 is a scalar and the processor 14 is a vector, the result of the processor 14 is not used in the store processing after synchronization, but the result of the processor 11 is stored. This does not change even if all the processors 14 perform scalar processing. The processing performed by the processor 14 is,
The operation is performed assuming that there is no branch outside the loop. Even if the operation is a scalar operation, it is necessary to store the result based on the mask created from the operation result of the processor 11 in order to use the result. There is.

【００２５】プロセッサ１１におけるｉｔ１，ｉｔ２，
ｉｔ３は、それぞれ処理α，β，γの実行回数を示して
いる。It1, it2 in the processor 11
it3 indicates the number of executions of the processes α, β, and γ, respectively.

【００２６】マスクはプロセッサ１１〜１４の全ての演
算終了後、プロセッサ１１の結果、ループの初期値、終
値、増分値及び、配列の場合にはその添字式により求め
る。仮にｉｔ１＝ｉｔ２＝ｘ,iｔ３＝ｘ-１であった場
合、各処理での変数のためのマスクは以下のようにな
る。プ口セッサ１マスク作成のためのデータ収集 After all the operations of the processors 11 to 14 are completed, the mask is obtained by the result of the processor 11, the initial value, the final value, the increment value of the loop, and, in the case of an array, the subscript expression. If it1 = it2 = x, it3 = x-1, the masks for variables in each process are as follows. Data collection for mask creation

【００２７】プ口セッサ２処理α Open mouth processor 2 processing α

【００２８】プロセッサ３処理β Processor 3 processing β

【００２９】もしｋがループ中一定ではなく処理β中で
計算されている場合には、ｋの値はプロセッサ３のみな
らず、プロセッサ１１でも算出する。プ口セッサ４処理γ If k is not constant during the loop but is calculated in the process β, the value of k is calculated not only by the processor 3 but also by the processor 11. Port processor 4 processing γ

【００３０】変数”ｓ”はベクトル化されている場合、
配列であるかのように用いられる。ストアの時には、マ
スクの最後のｏｎの要素が終値としてストアされる。If the variable "s" is vectorized,
Used as if it were an array. When storing, the last on element of the mask is stored as the closing value.

【００３１】これらのマスクの作成が終わった段階で、
まだ、ループ全体スカラ処理の演算（図４におけるプロ
セッサ１５が終了していない場合、この演算を中断す
る。プロセッサ５の処理は、従来の処理方法と同じであ
るため、ここでのストアは実際の記憶域にストアされて
いる。しかし、この後に、マスク付のストア処理（並列
化及びベクトル化のための後処理）によって上書きされ
るため、問題はない。At the stage when these masks have been created,
The operation of the entire loop scalar processing (if the processor 15 in FIG. 4 is not completed, this operation is interrupted. Since the processing of the processor 5 is the same as the conventional processing method, the store here is the actual processing). However, there is no problem since it is overwritten by a store process with a mask (post-processing for parallelization and vectorization).

【００３２】逆に、マスクの作成が終わる前に、プロセ
ッサ１５の処理が終了した場合、プロセッサ１１〜１４
を中断し、これらの処理をなかったものとする。前述し
たように、プロセッサ１ｌ〜１４による演算結果は、こ
の段階ではまだ実際の記憶域にストアしていないため、
問題はない。Conversely, if the processing of the processor 15 is completed before the mask is completed, the processors 11 to 14
Is interrupted and these processes are not performed. As described above, the operation results by the processors 11 to 14 are not yet stored in the actual storage area at this stage.
No problem.

【００３３】最後の並列化及びベクトル化のための後処
理で、配列に対してはマスク付のストア、変数に対して
はマスクを利用した終値保証を行う。プロセッサ１１の
処理の結果は必ずそのままストアする。プロセッサ１２
〜１４については、実行順にストアを行う。配列”ａ”
は処理αと処理γとで定義されているため、マスク付ス
トアは、プロセッサ１２の結果、プロセッサ１４の結果
の順でストアが行われる。In the final post-processing for parallelization and vectorization, a store with a mask is provided for an array, and a close value is guaranteed for a variable using a mask. The result of the processing by the processor 11 is always stored as it is. Processor 12
As for 14, stores are performed in the order of execution. Array "a"
Is defined by the processing α and the processing γ, the store with mask is performed in the order of the result of the processor 12 and the result of the processor 14.

【００３４】[0034]

【発明の効果】以上説明したように、本発明は、高級言
語でかかれたソースプログラムを読み込み構文解析を行
って中間テキストを生成し、この中間テキストを基に構
造解析を行い、中間テキストに対し分割と並列化とベク
トル化との少なくともいずれか１つを実施した場合に各
変数に対し定義参照関係に矛盾が生じるか否かを解析
し、この解析の結果が実行可能であるならばループ中の
ブランチ毎に処理をブロック化し、それぞれのブロック
をベクトル化したテキスト及びマスク生成のため各ＩＦ
文の条件式の算出を含む情報収集を行うテキストを作成
し、並列化及びベクトル化したテキストを用いることに
よる効果を予測し、予測の結果で効果が見込まれる場合
にはそれぞれのテキストを並列で実行させる並列化中間
テキストを作成し、この並列化中間テキストとループ全
体をスカラで実行させる中間テキストとを並列に実行
し、各タスクに自タスク終了時に他タスクが終了時して
いない場合に他タスクを中断する命令を組み込んだ目的
プログラムを生成することにより、ループ外への飛び出
しを持つループに対しても並列化とベクトル化とを可能
とし、実行性能を向上させることができるという効果が
有る。As described above, according to the present invention, a source program written in a high-level language is read and parsed to generate an intermediate text, and a structural analysis is performed based on the intermediate text. If at least one of the division, parallelization, and vectorization is performed, it is analyzed whether or not a contradiction occurs in the definition reference relation for each variable. If the result of this analysis is feasible, a loop is executed. Processing is blocked for each branch of each IF, and each IF is used to generate a text and mask by vectorizing each block.
Create a text that collects information including the calculation of the conditional expression of the sentence, predict the effect of using the parallelized and vectorized text, and if the effect of the prediction is expected, parallel each text Create parallelized intermediate text to be executed, execute the parallelized intermediate text and the intermediate text to execute the entire loop in scalar in parallel. By generating a target program in which an instruction for interrupting a task is incorporated, it is possible to parallelize and vectorize even a loop having a jump outside the loop, thereby improving the execution performance. .

[Brief description of the drawings]

【図１】本発明の第１の実施の形態を示すブロック図で
ある。FIG. 1 is a block diagram showing a first embodiment of the present invention.

【図２】本発明の適用が予測されるソースプログラムの
基本形を説明する説明図である。FIG. 2 is an explanatory diagram illustrating a basic form of a source program for which application of the present invention is predicted.

【図３】並列化ベクトル化処理部内の処理の流れを示す
流れ図である。FIG. 3 is a flowchart showing a flow of processing in a parallelization vectorization processing unit.

【図４】目的プログラムの実行状態を説明する説明図で
ある。FIG. 4 is an explanatory diagram illustrating an execution state of a target program.

【図５】本発明の実施例に使用するソースプログラムを
説明する説明図である。FIG. 5 is an explanatory diagram illustrating a source program used in the embodiment of the present invention.

【図６】ソースプログラムが目的プログラムに編集され
た時に、並列処理を行う各プロセッサの処理内容を説明
する説明図である。FIG. 6 is an explanatory diagram illustrating processing contents of each processor that performs parallel processing when a source program is edited into a target program.

[Explanation of symbols]

１編集装置２構文解析部３並列化ベクトル化処理部４コード生成部５構造解析部６データ依存関係解析部７並列／ベクトルテキスト生成部８効果シミュレート部１１，１２，１３，１４，１５プロセッサ DESCRIPTION OF SYMBOLS 1 Editing device 2 Syntax analysis part 3 Parallelization vectorization processing part 4 Code generation part 5 Structure analysis part 6 Data dependence analysis part 7 Parallel / vector text generation part 8 Effect simulation part 11, 12, 13, 14, 15 Processor

Claims

[Claims]

1. An editing system for generating and outputting a target program from a source program including a given loop, wherein the parsing unit reads a source program written in a high-level language and performs a syntax analysis to generate an intermediate text;
A parallelization vectorization processing unit for creating and outputting a parallelized intermediate text for parallelizing and vectorizing the intermediate text for parallel execution, and a code generation for generating and outputting a target program using the parallelized intermediate text And an editing method comprising:

2. The parallelized vectorization processing unit performs a structure analysis unit that performs a structure analysis based on an intermediate text, and performs at least one of division, parallelization, and vectorization on the intermediate text. A data dependency analysis unit for analyzing whether or not a contradiction occurs in the definition reference relationship for each variable, a parallel / vector text generation unit for generating a parallelized intermediate text for parallelization and vectorization, and a full loop 2. The editing method according to claim 1, further comprising an effect simulating unit for estimating and comparing execution times between the case of scalar and the case of parallelization and vectorization.

3. The target program generated by the code generation unit includes a task specified by the intermediate text and the parallelized intermediate text, which is executed in parallel, and each task is terminated when its own task ends. 2. The editing method according to claim 1, wherein an instruction for interrupting another task when there is no other task is incorporated.

4. An editing method for generating and outputting a target program from a source program including a given loop, reading a source program written in a high-level language, performing syntax analysis, generating an intermediate text, and using the intermediate text as a basis. And analyzing whether or not inconsistency occurs in the definition reference relationship for each variable when at least one of division, parallelization, and vectorization is performed on the intermediate text. If the result is executable, the process is blocked for each branch in the loop, and the text for which information collection including the calculation of the conditional expression of each IF statement for generating a mask and the text for each block is generated. Create, predict the effect of using parallelized and vectorized text, and if the prediction results show an effect, Creates a parallelized intermediate text that causes each text to be executed in parallel, executes this parallelized intermediate text and the intermediate text that causes the entire loop to be executed by a scalar in parallel, and allows each task to execute other tasks at the end of its own task. An editing method characterized by generating a target program incorporating an instruction for interrupting another task when not finished.