JPH02159014A

JPH02159014A - Method of and apparatus for producing data for charged particle beam lithography

Info

Publication number: JPH02159014A
Application number: JP63314298A
Authority: JP
Inventors: Kiyomi Koyama; 清美小山; Shuichi Tamamushi; 秀一玉虫
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1988-12-13
Filing date: 1988-12-13
Publication date: 1990-06-19
Anticipated expiration: 2013-02-10
Also published as: JP2710162B2

Abstract

PURPOSE:To establish high speed data conversion by simultaneously converting a plurality of pattern data each in a block unit to drawin data by parallely operating a plurality of pipe line-constructed processing units for data conversion. CONSTITUTION:A whole function is divided into three functions each processed by an exclusive cluster (processor group). In each cluster, a plurality of processor elements(PE) each a several K byte order local memory are coupled with each other through an about 1 M bites/sec data link. Processing among the clusters are executed in a pipe line manner. A distribution cluster 10 distributes pattern data in a block unit transferred from a host computer to a parallel processing unit composed of a plurality of figure computation clusters 20 (201-20N). The figure computation cluster 20 applies figure computation processing to the block data supplied from the distribution cluster 10, and outputs a result to a merge cluster 30. The merge cluster 30 receives a processed result from the figure computation cluster 20 and outputs block data in one unit.

Description

【発明の詳細な説明】［発明の目的］（産業上の利用分野）本発明は、荷電ビーム描画装置用の描画データ作成処理
（データ変換）を並列に高速で実行する荷電ビーム描画
用データの作成方法及び作成装置に関する。[Detailed Description of the Invention] [Object of the Invention] (Industrial Field of Application) The present invention is directed to a charged beam lithography data generator that executes lithography data creation processing (data conversion) for a charged beam lithography device in parallel at high speed. This invention relates to a production method and a production device.

（従来の技術）従来、ＬＳＩのパターンデータから電子ビーム描画装置
に許容される描画データ（ＥＢデータ）を作成するため
のデータ変換処理は、多くの場合ミニコン等の汎用計算
機を使ってソフトウェアで行われてきた。また、大規模
データに対処するためメモリのように同一パターンが繰
返し現れるＬＳＩについては、繰返し単位のみについて
データ処理を施し、残りについては処理を省略するとい
う手段が使われている。(Prior Art) Conventionally, data conversion processing for creating lithography data (EB data) acceptable to an electron beam lithography system from LSI pattern data was often performed by software using a general-purpose computer such as a minicomputer. It has been. Furthermore, in order to cope with large-scale data, for LSIs such as memories in which the same pattern appears repeatedly, a method is used in which data processing is performed only on repeating units and processing is omitted for the rest.

しかしながら、最近のＬＳＩの集積度の増加により、Ｅ
Ｂデータへのデータ変換に要する処理時間が膨大になっ
てきている。例えば、先端的メモリデバイスでは計、算
機処理に数日〜１週間程度かかるとの報告も出て来てい
る。繰返し情報を使って処理の高速化が図れるメモリパ
ターンにおいてもこのような状況で、この手法による処
理の高速化が望めないロジック回路やゲートアレイ等の
規則性が少ないパターンでは更に問題が深刻化するとい
う状況にある。However, due to the recent increase in LSI integration, E
The processing time required for data conversion to B data is becoming enormous. For example, there are reports that it takes several days to a week for calculations and computer processing in advanced memory devices. This is the case even with memory patterns that can speed up processing by using repetitive information, but the problem becomes even more serious in patterns with less regularity, such as logic circuits and gate arrays, where it is impossible to speed up processing using this method. The situation is as follows.

従って、このような非規則性ＬＳＩパターンに対しても
有効な、データ変換処理の新たな高速化手段を開発する
必要があった。しかもこの要求は、将来ますます大規模
化するＬＳＩに対しても、電子ビーム描画装置を実用に
供していく上で必須の課題となっている。Therefore, it was necessary to develop a new means for speeding up data conversion processing that is also effective for such irregular LSI patterns. Furthermore, this requirement has become an essential issue in putting electron beam lithography systems into practical use for LSIs that will become larger and larger in the future.

（発明が解決しようとする課題）このように従来、電子ビーム描画装置を実用に供してい
く上で、ＬＳＩパターンデータからＥＢデータを高速で
作成する必要が生じている。(Problems to be Solved by the Invention) As described above, in order to put an electron beam writing apparatus into practical use, it has become necessary to create EB data from LSI pattern data at high speed.

また、この問題は電子ビーム描画装置に限らず、イオン
ビーム描画装置についても同様に言えることである。Furthermore, this problem is not limited to electron beam lithography apparatuses, but also applies to ion beam lithography apparatuses.

本発明は、上記事情を考慮してなされたもので、その目
的とするところは、ＬＳＩパターンデータから荷電ビー
ム描画装置に許容される描画データを高速で作成するこ
とができ、描画スルーブツトの向上等に寄与し得る荷電
ビーム描画用データの作成方法及び作成装置を提供する
ことにある。The present invention has been made in consideration of the above-mentioned circumstances, and its purpose is to be able to create lithography data acceptable to a charged beam lithography system from LSI pattern data at high speed, and to improve lithography throughput, etc. It is an object of the present invention to provide a method and a device for creating data for charged beam drawing that can contribute to the improvement of the present invention.

［発明の構成］（課題を解決するための手段）本発明の骨子は、ブロック単位のパターンデータを並列
処理で高速に描画データに変換することにある。[Structure of the Invention] (Means for Solving the Problems) The gist of the present invention is to convert block-based pattern data into drawing data at high speed through parallel processing.

即ち本発明は、ＬＳＩパターンデータの所定のデータ空
間を互いに相関のない小領域のブロックに分割し、該分
割されたブロック単位のパターンデータを荷電ビーム描
画装置に許容される描画データに変換する荷電ビーム描
画用データの作成方法において、異なる処理を行うプロ
セッサを複数個直列に接続してなるパイプライン構成の
処理ユニットを複数個用い、ホスト計算機から転送され
るブロック単位のパターンデータを順次いずれかの処理
ユニットに転送し、各処理ユニットを並列に動作させて
該パターンデータを描画データに変換するようにした方
法である。That is, the present invention divides a predetermined data space of LSI pattern data into blocks of small areas that have no correlation with each other, and converts the pattern data of each divided block into writing data acceptable to a charged beam writing apparatus. In the method for creating data for beam writing, a plurality of processing units with a pipeline configuration, each consisting of a plurality of processors that perform different processes connected in series, are used to sequentially process pattern data in blocks transferred from a host computer into one of the In this method, the pattern data is transferred to a processing unit, and each processing unit is operated in parallel to convert the pattern data into drawing data.

より具体的には、ブロック単位のデータ変換処理を、少
なくとも２つの入出力端か入出力端に相当する通信用ボ
ートを持つプロセッサを複数段を並べたパイプライン構
成の処理ユニットを複数個並列に動作させて実行し、該
処理ユニットへのブロック単位のパターンデータの供給
は同上の特徴を有するプロセッサを少なくとも１つの入
出力端を使って複数個カスケードに接続し、ホスト計算
機から少なくとも１つのプロセッサに転送されたブロッ
ク単位のパターンデータを次々と先のプロセッサに転送
し、残りの入出力端に接続された該処理ユニットへ、該
処理ユニットからの要求に応じてブロック単位のパター
ンデータを転送することによって行う。More specifically, data conversion processing in units of blocks is performed in parallel by multiple processing units in a pipeline configuration in which multiple stages of processors each having at least two input/output terminals or communication ports corresponding to the input/output terminals are arranged. To operate and execute the pattern data in block units to the processing unit, a plurality of processors having the same characteristics as described above are connected in cascade using at least one input/output terminal, and a host computer is connected to at least one processor. Transferring the transferred block-by-block pattern data to the next processor one after another, and transferring the block-by-block pattern data to the processing unit connected to the remaining input/output terminal in response to a request from the processing unit. done by.

また本発明は、上記の変換処理を行うための荷電ビーム
描画用データの作成装置において、異なる処理を行う複
数のプロセッサを直列に接続してなり、ブロック単位の
パターンデータを描画データに変換するパイプライン構
成の図形演算クラスタを、Ｎ個並列に設けた並列処理ユ
ニットと、ブロック単位のパターンデータを一時保持す
るプロセッサをＮ個直列に接続すると共に各プロセッサ
を前記並列処理ユニットの各図形演算クラスタにそれぞ
れ接続してなり、ホスト計算機から転送されたブロック
単位のパターンデータを次々と先のプロセッサ又は前記
並列処理ユニットの１つの図形演算クラスタに転送する
分配クラスタと、ブロック単位の描画データを一時保持
するプロセッサをＮ個直列に接続すると共に各プロセッ
サを前記並列処理ユニットの各図形演算クラスタにそれ
ぞれ接続してなり、前記並列処理ユニットの各図形演算
クラスタで変換されたブロック単位の描画データを合成
して外部に転送するマージクラスタとを設けるようにし
たものである。Further, the present invention provides a charged beam writing data creation device for performing the above conversion process, in which a plurality of processors performing different processes are connected in series, and a pipe converts pattern data in units of blocks into writing data. A parallel processing unit in which N line-configured graphical operation clusters are arranged in parallel, and N processors that temporarily hold pattern data in block units are connected in series, and each processor is connected to each graphical operation cluster of the parallel processing unit. a distribution cluster that is connected to each other and that sequentially transfers block-by-block pattern data transferred from the host computer to the next processor or one graphic operation cluster of the parallel processing unit; and a distribution cluster that temporarily holds block-by-block drawing data. N processors are connected in series, and each processor is connected to each graphic operation cluster of the parallel processing unit, and drawing data in units of blocks converted by each graphic operation cluster of the parallel processing unit is synthesized. A merge cluster to be transferred to the outside is provided.

（作　用）本発明によれば、パイプライン構成の処理ユニットを複
数個並列動作させてデータ変換を行うことにより、ブロ
ック単位のパターンデータの複数個を同時に描画データ
に変換することができ、これによりデータ変換に要する
時間を大幅に短縮することができる。従って、非規則性
ＬＳＩパターンに対しても、高速でデータ変換を行うこ
とができ、描画スルーブツトの向上等に寄与することが
可能となる。(Function) According to the present invention, by operating a plurality of pipeline-configured processing units in parallel to perform data conversion, it is possible to simultaneously convert a plurality of block-based pattern data into drawing data. This can significantly reduce the time required for data conversion. Therefore, data conversion can be performed at high speed even for non-regular LSI patterns, contributing to improvement of drawing throughput and the like.

（実施例）以下、本発明の詳細を図示の実施例によって説明する。(Example) Hereinafter, details of the present invention will be explained with reference to illustrated embodiments.

本実施例に係わる並列データ変換処理では白黒反転（Ｎ
ＯＴ）、重ね除去（ＯＲ）等の論理演算や領域分割、ポ
リゴン分割等の分解演算等の高速処理を目標としている
。これらの処理では、データ空間を適当な大きさのメツ
シュ（ブロック）に切った場合、 ■ブロック間で独立に処理が可能である。In the parallel data conversion process according to this embodiment, black and white inversion (N
The goal is high-speed processing of logical operations such as OT), overlap removal (OR), and decomposition operations such as area division and polygon division. In these processes, if the data space is cut into meshes (blocks) of appropriate size, it is possible to process each block independently.

■処理結果の出力順序が変わっても、ブロックに識別番
号を付加する等の手段で、容易に論理的に再構成できる
。(2) Even if the output order of processing results changes, it can be easily logically reconfigured by adding identification numbers to blocks.

という特徴がある。これらの特徴を考慮して、本並列デ
ータ変換処理システムの構成を第１図のように決めであ
る。There is a characteristic that Taking these characteristics into consideration, the configuration of this parallel data conversion processing system was decided as shown in FIG.

第１図は本発明の一実施例に係わる電子ビーム描画用デ
ータの作成装置の概略構成を示すブロック図である。全
体機能を３つに分け、各機能を専用のクラスタ（プロセ
ッサ群）で処理する。即ち、データの分配を分配（Ｄ）
クラスタ１０で、図形演算を複数の図形演算（Ｐ）クラ
スタ２０　（２０＋〜２ＯＮ）で、処理結果の収集をマ
ージ（Ｍ）クラスタ３０で処理する。各クラスタでは数
百にバイトオーダのローカルメモリを持つ複数のプロセ
ッサエレメント（Ｐ　Ｅ）がＩＭバイト／ｓｅｅ程度の
データリンクで結合されている。またクラスタ間の処理
はパイプライン的に実行される。FIG. 1 is a block diagram showing a schematic configuration of an electron beam lithography data creation apparatus according to an embodiment of the present invention. The overall functions are divided into three parts, and each function is processed by a dedicated cluster (group of processors). That is, distribute the data distribution (D)
In the cluster 10, a plurality of graphic operations (P) clusters 20 (20+ to 2ON) process graphic operations, and the collection of processing results is processed in a merge (M) cluster 30. In each cluster, a plurality of processor elements (PEs) having local memories on the order of several hundred bytes are connected by data links on the order of IM bytes/see. Furthermore, processing between clusters is executed in a pipeline manner.

分配クラスタ１０は、ホスト計算機から転送されたブロ
ック単位のパターンデータを、複数の図形演算クラスタ
２０からなる並列処理ユニットに分配する。データの分
配は、図形演算クラスタ２０から要求を受けたＰＥがブ
ロック単位で転送する、と〜１う形で行われる。要求が
無い場合は、他の図形演算クラスタ２０の要求に応える
ため次のＰＨに先送りする。次のＰＥが一杯の場合、図
形演算クラスタ２０の新たな要求を持つか、−杯のＰＥ
が空くのを待つ。このようにして、各図形演算クラスタ
２０の負荷がほぼ均等になるようにブロックデータのパ
ターンデータを供給する役割を果す。The distribution cluster 10 distributes block-based pattern data transferred from the host computer to parallel processing units made up of a plurality of graphic operation clusters 20. Data distribution is performed in a manner such that PEs that receive requests from the graphical operation cluster 20 transfer data in blocks. If there is no request, it is postponed to the next PH in order to respond to requests from other graphic operation clusters 20. If the next PE is full, have a new request for the graphics operation cluster 20, or - the next PE is full.
wait until it becomes vacant. In this way, it serves to supply pattern data of block data so that the load on each graphic operation cluster 20 is approximately equal.

図形演算クラスタ２０は、分配クラスタ１０から供給さ
れたブロックデータに図形演算処理を施し、結果をマー
ジクラスタ３０に出力する。The graphic operation cluster 20 performs graphic operation processing on the block data supplied from the distribution cluster 10 and outputs the result to the merge cluster 30.

図示のように図形演算クラスタ２０は複数個あって、並
列に動作する。また、クラスタ内部の処理もパイプライ
ンである。各ＰＥがパイプラインステージを構成し、前
後のＰＥとの通信及びＰＥ内部での図形演算処理を並行
して行う。As shown in the figure, there are a plurality of graphic operation clusters 20, which operate in parallel. Furthermore, processing inside the cluster is also a pipeline. Each PE constitutes a pipeline stage, and communicates with the previous and subsequent PEs and performs graphic arithmetic processing within the PE in parallel.

パイプステージ間ではブロック単位でデータが移動する
。図形演算クラスタ２０では最も処理時間の長いステー
ジでスループットが制限される。このため、各ステージ
で処理時間が均等になるように処理を分割する必要があ
る。Data is moved in blocks between pipe stages. In the graphic operation cluster 20, throughput is limited at the stage that takes the longest processing time. For this reason, it is necessary to divide the processing so that the processing time is equal at each stage.

マージクラスタ３０は、図形演算クラスタ２０から処理
結果を受取り、ブロックデータを１単位として出力する
。マージクラスタ３０の各ＰＥは図形演算クラスタ２０
からのブロックデータの受取りと次のＰＥへの転送を並
行して行う。通常、マージクラスタ３０の出力光は大容
量の磁気ディスク装置等で、比較的低速である。但し、
ここを高速化する手法は種々考えられるため、今回はマ
ージクラスタ３０からの転送がボトルネックにはならな
いと仮定しても、本発明の有効性に何等の影響も与えな
い。The merge cluster 30 receives processing results from the graphic operation cluster 20 and outputs block data as one unit. Each PE of the merge cluster 30 is a graphic operation cluster 20
Receiving block data from PE and transferring it to the next PE are performed in parallel. Normally, the output light of the merge cluster 30 is a large-capacity magnetic disk device or the like, and the speed is relatively low. however,
Various methods can be considered to speed up this process, so even if we assume that the transfer from the merge cluster 30 does not become a bottleneck this time, it will not affect the effectiveness of the present invention in any way.

次に、上記構成された本装置の作用をシミュレーション
と解析的評価を使いながら説明する。Next, the operation of the apparatus configured as described above will be explained using simulation and analytical evaluation.

既にのべたように、本並列変換プロセッサでは分配９図
形演算、マージの各クラスタ間でもパイブライン処理さ
れる。このため、ＴＤ・・・Ｄクラスタの全データ処理時間（ｔｏ　）ＴＰ・・・Ｐクラスタの全データ処理時間（ｔｐ　）ＴＭ・・・Ｍクラスタの全データ処理時間（ｔＭ）とすると（但し、括弧内はブロック１個当りの処理時間
）、全体の処理時間Ｔは、Ｔ腸ＴＤ＋ｔｐ＋ｔＭ　　　　・・・・・・（８）とな
る。即ち、最大のスルーブツトを得るためにはＴ。＋　
　ｉＰ＋　　ｔＭを最小にする条件を見つければよい。As already mentioned, in this parallel conversion processor, pipeline processing is also performed between each cluster for distribution 9 graphic operations and merging. Therefore, the total data processing time for the TD...D cluster (to), the total data processing time for the TP...P cluster (tp), and the total data processing time for the TM...M cluster (tM) (however, The processing time per block is shown in parentheses), and the total processing time T is as follows: T intestine TD+tp+tM (8). That is, in order to obtain maximum throughput, T. +
All you have to do is find a condition that minimizes iP+tM.

なお、ＴＤはｔＰ＊　　ｉＭにも依存する量である。Note that TD is a quantity that also depends on tP*iM.

以下に、マージクラスタ３０の処理速度でＴＤ及びｔｐ
が制約されないとの条件のもとで、分配クラスタ１０と
図形演算クラスタ２０について性能を評価した結果を示
す。分配クラスタ１０の性能をソフトウェアシミュレー
タで評価した。シミュレータではデータ量、処理時間及
びそれらの分布等に抽象化されたブロックデータを入力
して特性を評価する。Below, TD and tp are calculated based on the processing speed of the merge cluster 30.
The results of evaluating the performance of the distribution cluster 10 and the graphic operation cluster 20 under the condition that there are no restrictions are shown below. The performance of the distributed cluster 10 was evaluated using a software simulator. In the simulator, block data abstracted in terms of data volume, processing time, distribution thereof, etc. is input and characteristics are evaluated.

シミュレーションでは分配クラスタ１０におけるＰＥ間
でのデータ分配方式１労配クラスタ１０内のＰＥの最適
個数及びＰＥ内バッファの最適個数を求めた。条件は、
分配クラスタ１０内のＰＥ間転送速度及び図形演算クラ
スタ２０への転送速度をＩ　Ｍ　Ｂ　／ｓｅｅ　、ホス
トから転送される平均ブロックデータ量を６４０Ｂ、入
力ブロック個数を１０００とした。図形演算クラスタ２
０で１個のブロックデータ（ブロック単位のパターンデ
ータ）がパイプステージ１段を通過するのに要する時間
（ξ）を変えて測定した。In the simulation, the optimal number of PEs in the distribution cluster 10 and the optimal number of buffers within the PEs were determined based on the data distribution method among PEs in the distribution cluster 10. condition is,
The transfer rate between PEs in the distribution cluster 10 and the transfer rate to the graphic operation cluster 20 were I M B /see, the average amount of block data transferred from the host was 640B, and the number of input blocks was 1000. Graphic operation cluster 2
Measurements were made by varying the time (ξ) required for one block of data (pattern data in block units) to pass through one stage of the pipe stage.

分配クラスタ１０のスルーブツトがξによってどのよう
に変化するかを測定した。ブロックデータの転送を最後
（第１図の最右側）のＰＥで行き止まりとする方法（方
法１）と、最後のＰＥから最初（第１図の最左側）のＰ
Ｈに還流する方法（方法２）とで比較した。We measured how the throughput of the distribution cluster 10 changes with ξ. A method (Method 1) in which block data transfer ends at the last PE (the rightmost PE in Figure 1), and a method (Method 1) in which the block data transfer ends at the last PE (the rightmost PE in Figure 1).
A comparison was made with the method of refluxing to H (method 2).

第２図はクラスタ内ＰＥ個数（ｎ　ＰＨ）はｌＯ１バッ
ファ個数（ｎＢ）は３で、ξを０．２〜２００ｍ５ｅｃ
／ブロックで変化させた結果である。スルーブツトは１
／ξに比例して増加するが、共に１／ξ−１００（ξ−
１０ｍ５ｅｃ／ブロック）程度で飽和する。本測定条件
では方法１、方法２のグラフが全域に渡ってほぼ重なり
、両方法間の有意差は認められない。このため以降の測
定は方法１で行った。In Figure 2, the number of PEs in the cluster (n PH) is lO1, the number of buffers (nB) is 3, and ξ is 0.2 to 200 m5ec.
/ This is the result of changing the block. Thrubutt is 1
/ξ, but both increase in proportion to 1/ξ-100(ξ-
It saturates at about 10m5ec/block). Under these measurement conditions, the graphs of Method 1 and Method 2 almost overlap over the entire area, and no significant difference is observed between the two methods. For this reason, subsequent measurements were performed using method 1.

次に、ＰＥ個数による分配クラスタ１０のスルーブツト
の変化を調べた。その結果を、第３図に示す。ここでは
、ｎＢ　＝３、ξ−２０，５０゜２００５ｓｅｃ／ブロ
ツクとした。ξ−２０のときはＰＥ個数の増加と共にス
ルーブツトも増加するが、ｎ　Ｐｌ！−４０付近でピー
クに達し、それ以上は効果がない。一方、ξ−５０では
ｎ　ＰＥ””　１００まで、ξｍ２００のときはｎ　ｐ
ＦＬ−２００まで並列化の効果がある。つまり、ξが小
さいほどｎＰＥを増加することの効果が大きい。Next, changes in the throughput of the distribution cluster 10 depending on the number of PEs were investigated. The results are shown in FIG. Here, nB = 3, ξ-20, 50° 2005 sec/block. When ξ-20, the throughput increases as the number of PEs increases, but n Pl! It reaches a peak around -40 and has no effect beyond that point. On the other hand, when ξ-50, n PE"" up to 100, and when ξm200, n p
It has the effect of parallelization up to FL-200. In other words, the smaller ξ is, the greater the effect of increasing nPE is.

第４図は、ＰＥ内バッファ個数による分配クラスタ１０
のスルーブツトの変化を調べたちのである。ここで、ｎ
９を増すとスルーブツトも増加するが、ξ−５〜５５で
はｎ、８３で、またその他の条件でｎＢ−２で飽和する
。飽和の原因は、図形演算クラスタ２０の処理速度がデ
ータリンクの転送速度より遅いためと考えられる。FIG. 4 shows the distribution cluster 10 according to the number of buffers in PE.
We investigated the changes in the throughput. Here, n
When increasing 9, the throughput also increases, but it saturates at n, 83 for ξ-5 to 55, and saturates at nB-2 under other conditions. The reason for the saturation is considered to be that the processing speed of the graphic operation cluster 20 is slower than the transfer speed of the data link.

これは、飽和時のスルーブツトがξにほぼ逆比例してい
ることからも明らかである。This is also clear from the fact that the throughput at saturation is almost inversely proportional to ξ.

以上のシミュレーション結果から、ＥＢデータ変換で予
想される処理条件では分配クラスタ１０におけるＰＥの
バッファ個数３、ＰＥ個数４０程度で十分である。その
場合、１／ξ、即ち図形演算クラスタ２０のパイプライ
ン速度に比例したスルーブツトが分配クラスタ１０で得
られる。From the above simulation results, it is sufficient that the number of PE buffers in the distribution cluster 10 is 3 and the number of PEs is about 40 under the expected processing conditions for EB data conversion. In that case, a throughput proportional to 1/ξ, ie, the pipeline speed of the graphics operation cluster 20, is obtained in the distribution cluster 10.

また、図形演算クラスタ２０のパイプライン処理をモデ
ル化してスルーブツトを評価した。In addition, the pipeline processing of the graphic operation cluster 20 was modeled to evaluate the throughput.

図形演算クラスタ２０の１個につき、バイブの段数をｎ
、ＰＥ間通信速度をτ、クラスタで処理するブロックの
個数をｍ、ブロック１個の処理時間をσとする。このと
き、各図形演算クラスタ２０での処理のタイムチャート
は第５図のようになる。図で各ＰＥは入力と出力用のバ
ッファを備え、前後のＰＥとの通信を同時に実行できる
ものとしである。For each graphic operation cluster 20, the number of vibrator stages is n.
, the inter-PE communication speed is τ, the number of blocks processed in a cluster is m, and the processing time for one block is σ. At this time, the time chart of the processing in each graphic operation cluster 20 is as shown in FIG. In the figure, it is assumed that each PE is equipped with an input and output buffer and can simultaneously execute communication with the previous and subsequent PEs.

第５図に示すように最後即ち、ｍ番目のブロックがＰＥ
、を通過してから図形演算クラスタ２０を出るまでに、
（ｎ−１）段のバイブステージを通過しなければならな
い。従って、図形演算クラスタ２０の処理時間ＴＰはｍ
個のブロックがＰＥ、を通過するのに要する時間と１個
のブロックが（ｎ−１）段のバイブを通過するのに要す
る時間の和として、Ｔｐ　＝　（ｍ＋ｎ−１）　（σ／ｎ＋ｒ）　−（２）
と表せる。ｍ、σ、τは、ＰＥ個数ｎについて不変なの
でＴＰを最小にするｎはｎ−（ｍ−１）σ／τ　　　　　・・・（３）となり、
このとき処理時間Ｔ、−ａ＋　（ｍ−１）ｒ＋２　　　（ｍ−１）ｒａ−
（シ（ｍ−１）ｒ　＋Ｊ７１　２　　・・・（４）が得
られる。As shown in FIG. 5, the last, m-th block is PE
, and before leaving the graphic operation cluster 20,
It must pass through (n-1) vibration stages. Therefore, the processing time TP of the graphic operation cluster 20 is m
As the sum of the time required for 1 block to pass through PE and the time required for 1 block to pass through (n-1) stages of vibes, Tp = (m+n-1) (σ/n+r) -(2)
It can be expressed as Since m, σ, and τ remain unchanged with respect to the number of PEs n, the n that minimizes TP is n-(m-1)σ/τ (3),
At this time, the processing time T, -a+ (m-1)r+2 (m-1)ra-
(Sh(m-1)r+J71 2 (4) is obtained.

次に、本実施例装置を実際のＬＳＩパターンを使ったデ
ータ変換処理に当てはめて説明する。Next, the device of this embodiment will be explained by applying it to data conversion processing using an actual LSI pattern.

データ量５ＭＢ、図形数５０万個のパターンの白黒反転
処理を考える。これを、処理速度１５旧ＰＳの大型計算
機で処理するとＣＰＵタイムで８５秒かかる。これを、
仮想記憶方式のスーパーミニコンで処理すると処理時間
は２１分２０秒になる。本発明の要素プロセッサとして
、例えば（英国Ｉ　ｎｍｏｓ社のプロセッサＴｒａｎｓ
ｐｕｔｅｒ；処理能力２旧ＰＳ、データリンクスピード
Ｉ　ＭＢ／　ｓｅｅ　。Consider black and white inversion processing for a pattern with a data amount of 5 MB and a number of 500,000 figures. If this is processed on a large computer with a processing speed of 15 old PS, it will take 85 seconds of CPU time. this,
When processed on a virtual memory super minicomputer, the processing time is 21 minutes and 20 seconds. As an elemental processor of the present invention, for example, a processor Trans from Inmos (UK) is used.
puter; processing capacity 2 old PS, data link speed I MB/see.

主記憶〜２ＭＢ）を使うことを想定する。このプロセッ
サを使って図形演算クラスタ２０を構成し、ｌＯ個並列
に動作させる場合を考える。It is assumed that main memory (~2MB) is used. Let us consider a case where a graphic operation cluster 20 is constructed using this processor and 10 processors are operated in parallel.

（但し、リンク転送と並行した処理能力が１．５ＭＩＰ
Ｓに低下すると仮定する）。前記データを１０．０００
個のブロックに分割して処理すると、ａ　−１２８ｍ５
ｅｃ　、　ｍ　−１０００また、ブロック１個当りの平
均データ量は５００Ｂだから、ｒ　−０，５ｍｓｅｃこの条件での処理時間を第６図に示す。合図形演算クラ
スタを１０個のＰＥで構成すると、前記（２）式からＴｐ　＝　１３．４　ｓｅｃこのとき、 ξ−１２８７１０＋０．５−１３．３　（ｍｓｅｃ／ブ
Ｏ−／り）だから、１／ξ−７５（ブロック／５ｅｅ）第２図より分配クラスタ１０のスルーブツトは３４０Ｋ
　Ｂ　／　ｓｅｅと求められ、Ｔ　、　−５Ｍ　Ｂ　／
　３４０　Ｋ　Ｂ　−１４，８（ｓｅｅ）となる。とこ
ろで、最後のブロックを処理するのに更に、ｔ　　、　　＝　　　１２８＋　　０．５ＸｌＯ−１３
３（ｍｓｅｃ）かかるから、この条件では白黒反転処理
が約１５秒で処理できることになる。(However, the processing capacity in parallel with link transfer is 1.5MIP.
). The above data is 10.000
When divided into blocks and processed, a −128m5
ec, m -1000 Also, since the average amount of data per block is 500B, r -0,5 msec The processing time under this condition is shown in FIG. If a signal type operation cluster is composed of 10 PEs, Tp = 13.4 sec from the above equation (2). At this time, ξ-128710+0.5-13.3 (msec/B O-/ri), so 1/ ξ-75 (block/5ee) From Figure 2, the throughput of the distribution cluster 10 is 340K.
B/see, T, -5M B/
340 K B -14,8 (see). By the way, to process the last block, t, = 128 + 0.5XlO-13
Since it takes 3 (msec), under these conditions, the black and white inversion process can be processed in about 15 seconds.

次に、前記（８）式を使って図形演算クラスタ２０での
処理時間を最短にするＰＥ個数を計算すると、ｎ　　−５０５となり、前記（４）式から処理時間Ｔ　ｐ　＝　ｌ／１３ｓｅｃが得られる。また、このとき ξ−１２８１５０５＋０．５＝　０．８　（ｍｓｅｃ／
ブロック）だから、分配クラスタ１０のスルーブツトが
第２図から約ＩＭＢ／ｓｅｅと求められ、Ｔ　ｏ−５Ｍ
　Ｂ　／　Ｉ　Ｍ　Ｂ　−５（ｓｅｅ）となる。また、
同様にｔ　ｐ　＝　１２８１５０５＋　０．５Ｘ　５０５−２
５３　（ｍｓｃｃ）従って、この条件では白黒反転処理
が約５秒で処理できることになる。Next, when the number of PEs that minimizes the processing time in the graphic operation cluster 20 is calculated using the above equation (8), it becomes n -505, and the processing time T p = l/13 sec is obtained from the above equation (4). It will be done. Also, at this time ξ-1281505+0.5=0.8 (msec/
block) Therefore, the throughput of the distribution cluster 10 is determined from FIG. 2 to be approximately IMB/see, and T o -5M
B/IM B-5 (see). Also,
Similarly, t p = 1281505+ 0.5X 505-2
53 (mscc) Therefore, under this condition, black and white inversion processing can be performed in about 5 seconds.

か（して本実施例によれば、パイプライン構成の図形演
算クラスタ２０を複数個並列動作させ、ブロック単位の
パターンデータを並列処理で高速に描画データに変換す
ることにより、ブロック単位のパターンデータの複数個
を同時に描画データに変換することができ、これにより
データ変換に要する時間を大幅に短縮することができる
。また、結果を収集する処理部分で律速されないとの条
件下で全体のスルーブツトをシミュレーション及び解析
的手段で評価した結果、１５旧ＰＳ大型計算機で１．５
分、スーパーミニコンで２１分かかる白黒反転処理が、
１５秒程度に短縮できることが実証された。この場合、
必要なＰＨの総個数は１２０で、並列処理部を除いた図
形演算用ソフトな既存のものが流用できる。(Thus, according to the present embodiment, by operating a plurality of pipelined graphic operation clusters 20 in parallel and converting block-by-block pattern data into drawing data at high speed through parallel processing, block-by-block pattern data is can be converted into drawing data at the same time, which greatly reduces the time required for data conversion.In addition, the overall throughput can be reduced as long as the processing part that collects the results is not rate-limited. As a result of evaluation using simulation and analytical means, it was 1.5 on the 15 old PS large-scale computer.
The black and white reversal process, which takes 21 minutes on a super minicomputer,
It has been demonstrated that the time can be shortened to about 15 seconds. in this case,
The total number of required PHs is 120, and existing graphics calculation software can be used except for the parallel processing section.

従って二本並列プロセッサの開発コストは、先に述べた
Ｔｒａｎｓｐｕｔｃｒ等のチップを使えば従来の大型計
算機を使った場合と比べて大幅なコストダウンでより高
速のデータ変換が実現できる。Therefore, the cost of developing a two-parallel processor can be significantly reduced by using a chip such as the aforementioned Transputcr, compared to the case of using a conventional large-scale computer, and higher-speed data conversion can be achieved.

また、本実施例ではパターンの繰返しを使わないことか
ら、非規則性パターンの高速変換処理上極めて効果が高
い。Furthermore, since this embodiment does not use pattern repetition, it is extremely effective in high-speed conversion processing of irregular patterns.

なお、本発明は上述した実施例に限定されるものではな
い。例えば、対象とするデータ空間は、ステップ及リピ
ート方式の電子ビーム描画装置において、１回のステッ
プ＆リピート中に描画する単位（１メインフイールド又
はサブフィールド）とすればよい。さらに、ステージ連
続移動方式の電子ビーム描画装置において、１回のステ
ージ連続移動中に描画する単位（１フレーム）としても
よい。また、ＬＳＩパターンデータか階層的に表現され
ている場合、上記データ空間を階層的に表現された１階
層（最下位層に限らない）の構成単位とすればよい。さ
らに、電子ビーム描画装置が２段偏向方式の場合、前記
ブロック単位のデータをいずれか一方の偏向領域に相当
するｊｐ位とすればよい。また、電子ビーム描画装置に
限らず、イオンビーム描画装置にも適用できるのは勿論
のことである。その他、本発明の要旨を逸脱しない範囲
で、種々変形して実施することができる。Note that the present invention is not limited to the embodiments described above. For example, the target data space may be a unit (one main field or subfield) that is drawn during one step and repeat in a step and repeat type electron beam drawing apparatus. Furthermore, in an electron beam lithography apparatus using a continuous stage movement method, a unit (one frame) of lithography during one continuous movement of the stage may be used. Further, when the LSI pattern data is expressed hierarchically, the data space may be a constituent unit of one layer (not limited to the lowest layer) that is hierarchically expressed. Furthermore, when the electron beam lithography apparatus is of a two-stage deflection type, the data for each block may be set to about jp corresponding to one of the deflection areas. Moreover, it goes without saying that the present invention can be applied not only to electron beam lithography apparatuses but also to ion beam lithography apparatuses. In addition, various modifications can be made without departing from the gist of the present invention.

【発明の効果］以上詳述したように本発明によれば、パイプライン構成
の処理ユニットを複数個並列動作させてデータ変換を行
うことにより、データ変換に要する時間を大幅に短縮す
ることができる。[Effects of the Invention] As detailed above, according to the present invention, the time required for data conversion can be significantly reduced by operating a plurality of pipeline-configured processing units in parallel to perform data conversion. .

従って、非規則性ＬＳＩパターンに対しても、高速でデ
ータ変換を行うことができ、描画スルーブツトの向上等
をはかることができる。Therefore, data conversion can be performed at high speed even for irregular LSI patterns, and the drawing throughput can be improved.

[Brief explanation of the drawing]

第１図は本発明の一実施例に係わる電子ビーム描画デー
タ作成装置の概略構成を示すブロック図、第２図は図形
演算クラスタの処理速度による分配クラスタのスルーブ
ツトの変化を示す特性図、第３図はＰＥ個数による分配
クラスタのスルーブツトの変化を示す特性図、第４図は
ＰＥ内バッファ個数による分配クラスタのスループット
の変化を示す特性図、第５図は図形演算クラスタでの処
理状態を示すタイムチャート、第６図は白黒反転処理に
おけるＰＥ個数による分配クラスタ及び図形演算クラス
タのスルーブツトの変化を示す特性図である。１０・・・分配クラスタ、２０（２０＋〜２ＯＮ）・・
・図形演算クラスタ（処理ユニット）　３０・・・マー
ジクラスタ、ＰＥ・・・プロセッサエレメント。出願人代理人　弁理士　鈴　江　武　彦第１第６区FIG. 1 is a block diagram showing a schematic configuration of an electron beam lithography data creation apparatus according to an embodiment of the present invention, FIG. 2 is a characteristic diagram showing changes in the throughput of the distribution cluster depending on the processing speed of the graphic operation cluster, and FIG. The figure is a characteristic diagram showing the change in the throughput of the distribution cluster depending on the number of PEs, Figure 4 is the characteristic diagram showing the change in the throughput of the distribution cluster depending on the number of buffers in the PE, and Figure 5 is the time diagram showing the processing status in the graphic operation cluster. The chart in FIG. 6 is a characteristic diagram showing changes in throughput of distribution clusters and graphic operation clusters depending on the number of PEs in black-and-white inversion processing. 10...Distribution cluster, 20 (20+~2ON)...
・Graphic operation cluster (processing unit) 30... Merge cluster, PE... Processor element. Applicant's Representative Patent Attorney Takehiko Suzue 1st 6th Ward

Claims

[Claims]

(1) Charged beam lithography that divides a predetermined data space of LSI pattern data into blocks of small areas that have no correlation with each other, and converts the pattern data of each divided block into lithography data that is acceptable to a charged beam lithography system. In the method of creating data for , a pipeline configuration consisting of multiple processors connected in series that perform different processes is used, and the block-by-block pattern data transferred from the host computer is sequentially processed by one of the processing units. 1. A method for creating charged beam drawing data, the method comprising: transferring the pattern data to a computer, and converting the pattern data into drawing data by operating each processing unit in parallel.

(2) The data space is a unit of writing during one step and repeat in a step-and-repeat type electron beam lithography system, or a unit of writing during one continuous stage movement in a stage continuous movement type electron beam lithography system. 2. The method of creating charged beam drawing data according to claim 1, wherein the unit is a unit of data.

(3) An identification number is attached to the block-by-block pattern data, so that the block-by-block drawing data output from the processing unit in no particular order can be logically recombined. How to create data for charged beam lithography.

(4) Charged beam lithography that divides a predetermined data space of LSI pattern data into blocks of small areas that have no correlation with each other, and converts the pattern data of each divided block into lithography data that is acceptable to a charged beam lithography device. In the data creation device, parallel processing is performed in which N graphic operation clusters are arranged in parallel in a pipeline configuration in which multiple processors that perform different processes are connected in series and convert pattern data in units of blocks into drawing data. unit, and N processors that temporarily hold pattern data in block units are connected in series, and each processor is connected to each graphic operation cluster of the parallel processing unit, and the pattern data in block units is transferred from the host computer. A distribution cluster that transfers data one after another to the next processor or one graphic operation cluster of the parallel processing unit, and N processors that temporarily hold drawing data in units of blocks are connected in series, and each processor is connected to the parallel processing unit. and a merge cluster that is connected to each of the graphical operation clusters of the parallel processing unit and that combines the block-by-block drawing data converted by each of the graphical operation clusters of the parallel processing unit and transfers it to the outside. A device that creates data for charged beam lithography.